函数参数

1 引用传递

img

可变对象 改变原来

不可变对象 不改变原来

2 默认参数

https://blog.csdn.net/weixin_41972881/article/details/81562731

https://blog.csdn.net/weixin_45775963/article/details/103696945

1
2
3
4
5
6
def fun(va1,va2=[]):
print(va2)
va2.append(va1)
return va2
te1=fun(10)
te1=fun(20)

va2如果没有传参,采用默认的,默认的会变化,不是一直是[]

va2如果是外部的传参,以传参为主,会覆盖

3 可变参数

1 *args

def test(*args)

​ print(args)

test(1,2,3,4)

test(*(1,2,3,4))

(1,2,3,4)

(1,2,3,4)

2 **kwargs

def test(**kwargs)

​ print(args)

test(x=1,y=2,z=3)

test(**{‘x’:1,’y’:2,’z’:3})

{‘x’:1,’y’:2,’z’:3}

{‘x’:1,’y’:2,’z’:3}

Enhanced-RCNN An Efficient Method for Learning Sentence Similarity

特点:非预训练,参数量少

1 input encoding

得到两个encoding,RNN Encoding,RCNN Encoding

1 BiGRU

$\textbf{a}=\{a_1,a_2,…,a_{l_a}\},\textbf{a}$ 是句子,$l_a$ 是句子1的长度

得到RNN Encoding,$\overline{\textbf{p}}_i$统一表示$\overline{\textbf{a}}_i,\overline{\textbf{b}}_i$

2 CNN

在 BiGRU 编码的基础上,使用 CNN 来进行二次编码

结构如下,“newtork in network”,k 是卷积核的kernel size,比如k=1,卷积核为$1 \times 1$

对于每个 CNN 单元,具体的计算过程如下:

得到 RCNN Encoding $\widetilde{\textbf{p}}_i$

2 Interactive Sentence Representation

1 Soft-attention Alignment

attention:

加了attention的rnn encoding:

2 Interaction Modeling

$\overline{\textbf{p}}$是rnn encoding

$\hat{}$是加了attention的rnn encoding

$\widetilde{}$是rcnn encoding

最终得到Interactive Sentence Representation为$\textbf{o}_a,\textbf{o}_b$

3 Similarity Modeling

1 Fusion Layer

g是门控函数

2 Label Prediction

全连接层

4 loss

交叉熵

参考

https://sci-hub.st/10.1145/3366423.3379998

https://zhuanlan.zhihu.com/p/138061003

Pre-train, Prompt, and Predict A Systematic Survey of Prompting Methods in Natural Language Processing

0 和pre-train,finetune区别

prompt感觉是一种特殊的finetune方式,还是先pre-train然后prompt tuning

目的:prompt narrowing the gap between pre-training and fine-tuning

1 怎么做

3步

1 Prompt Addition

$x^{‘}=f_{prompt}(x)$ x是input text

  1. Apply a template, which is a textual string that has two slots: an input slot [X] for input x and an answer slot
    [Z] for an intermediate generated answer text z that will later be mapped into y.
  2. Fill slot [X] with the input text x.

f:fills in the location [Z] in prompt $x^{‘}$ with the potential answer z

Z:a set of permissible values for z

3 Answer Mapping

因为上面的 $\hat{z}$ 还不是 $\hat{y}$,比如情感分析,“excellent”, “fabulous”, “wonderful” -》positive

go from the highest-scoring answer $\hat{z}$ to the highest-scoring output $\hat{y}$

4 举个例子,文本情感分类的任务

原来

“ I love this movie.” -》 positive

现在

1 $x=$ “ I love this movie.” -》模板为: “ [x] Overall, it was a [z] movie.” -》$x^{‘}$为”I love this movie. Overall ,it was a [z] movie.”

2 下一步会进行答案搜索,顾名思义就是LM寻找填在[z] 处可以使得分数最高的文本 $\hat{z}$(比如”excellent”, “great”, “wonderful” )

3 最后是答案映射。有时LM填充的文本并非任务需要的最终形式(最终为positive,上述为”excellent”, “great”, “wonderful”),因此要将此文本映射到最终的输出$\hat{y}$

2 Prompt方法分类

3 Prompt Engineering

1 one must first consider the prompt shape,

2 then decide whether to take a manual or automated approach to create prompts of the desired shape

1 Prompt Shape

Prompt的形状主要指的是[X]和[Z]的位置和数量。

如果在句中,一般称这种prompt为cloze prompt;如果在句末,一般称这种prompt为prefix prompt

在实际应用过程中选择哪一种主要取决于任务的形式和模型的类别。cloze prompts和Masked Language Model的训练方式非常类似,因此对于使用MLM的任务来说cloze prompts更加合适;对于生成任务来说,或者使用自回归LM解决的任务,prefix prompts就会更加合适;Full text reconstruction models较为通用,因此两种prompt均适用。另外,对于文本对的分类,prompt模板通常要给输入预留两个空,[x1]和[x2]。

2 create prompts

1 Manual Template Engineering

2 Automated Template Learning

1 Discrete Prompts

the prompt 作用在文本上

D1: Prompt Mining

D2: Prompt Paraphrasing

D3: Gradient-based Search

D4: Prompt Generation

D5: Prompt Scoring

2 Continuous Prompts

the prompt 直接作用到模型的embedding空间

C1: Prefix Tuning

C2: Tuning Initialized with Discrete Prompts

C3: Hard-Soft Prompt Hybrid Tuning

4 Answer Engineering

two dimensions that must be considered when performing answer
engineering:1 deciding the answer shape and 2 choosing an answer design method.

1 Answer Shape

和Prompt Shape啥区别???

2 Answer Space Design Methods

1 Manual Design
2 automatic automatic

5 Multi-Prompt Learning

之前在讨论single prompt,现在介绍multiple prompts

6 Training Strategies for Prompting Methods

1 Training Settings

full-data

few-shot /zero-shot

2 Parameter Update Methods

参考

https://arxiv.org/abs/2107.13586

刘鹏飞博士 https://zhuanlan.zhihu.com/p/395115779

https://zhuanlan.zhihu.com/p/399295895

https://zhuanlan.zhihu.com/p/440169921

https://zhuanlan.zhihu.com/p/399295895

  

Generalizing from a Few Examples A Survey on Few-Shot Learning

paper: https://arxiv.org/abs/1904.05046

git: https://github.com/tata1661/FSL-Mate/tree/master/FewShotPapers#Applications

原文按应用对FSL做了总结,与NLP相关的有:

  1. High-risk learning: Acquiring new word vectors from tiny data, in EMNLP, 2017. A. Herbelot and M. Baroni. paper
  2. MetaEXP: Interactive explanation and exploration of large knowledge graphs, in TheWebConf, 2018. F. Behrens, S. Bischoff, P. Ladenburger, J. Rückin, L. Seidel, F. Stolp, M. Vaichenker, A. Ziegler, D. Mottin, F. Aghaei, E. Müller, M. Preusse, N. Müller, and M. Hunger. paper code
  3. Few-shot representation learning for out-of-vocabulary words, in ACL, 2019. Z. Hu, T. Chen, K.-W. Chang, and Y. Sun. paper
  4. Learning to customize model structures for few-shot dialogue generation tasks, in ACL, 2020. Y. Song, Z. Liu, W. Bi, R. Yan, and M. Zhang. paper
  5. Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network, in ACL, 2020. Y. Hou, W. Che, Y. Lai, Z. Zhou, Y. Liu, H. Liu, and T. Liu. paper
  6. Meta-reinforced multi-domain state generator for dialogue systems, in ACL, 2020. Y. Huang, J. Feng, M. Hu, X. Wu, X. Du, and S. Ma. paper
  7. Few-shot knowledge graph completion, in AAAI, 2020. C. Zhang, H. Yao, C. Huang, M. Jiang, Z. Li, and N. V. Chawla. paper
  8. Universal natural language processing with limited annotations: Try few-shot textual entailment as a start, in EMNLP, 2020. W. Yin, N. F. Rajani, D. Radev, R. Socher, and C. Xiong. paper code
  9. Simple and effective few-shot named entity recognition with structured nearest neighbor learning, in EMNLP, 2020. Y. Yang, and A. Katiyar. paper code
  10. Discriminative nearest neighbor few-shot intent detection by transferring natural language inference, in EMNLP, 2020. J. Zhang, K. Hashimoto, W. Liu, C. Wu, Y. Wan, P. Yu, R. Socher, and C. Xiong. paper code
  11. Few-shot learning for opinion summarization, in EMNLP, 2020. A. Bražinskas, M. Lapata, and I. Titov. paper code
  12. Adaptive attentional network for few-shot knowledge graph completion, in EMNLP, 2020. J. Sheng, S. Guo, Z. Chen, J. Yue, L. Wang, T. Liu, and H. Xu. paper code
  13. Few-shot complex knowledge base question answering via meta reinforcement learning, in EMNLP, 2020. Y. Hua, Y. Li, G. Haffari, G. Qi, and T. Wu. paper code
  14. Self-supervised meta-learning for few-shot natural language classification tasks, in EMNLP, 2020. T. Bansal, R. Jha, T. Munkhdalai, and A. McCallum. paper code
  15. Uncertainty-aware self-training for few-shot text classification, in NeurIPS, 2020. S. Mukherjee, and A. Awadallah. paper code
  16. Learning to extrapolate knowledge: Transductive few-shot out-of-graph link prediction, in NeurIPS, 2020:. J. Baek, D. B. Lee, and S. J. Hwang. paper code
  17. MetaNER: Named entity recognition with meta-learning, in TheWebConf, 2020. J. Li, S. Shang, and L. Shao. paper
  18. Conditionally adaptive multi-task learning: Improving transfer learning in NLP using fewer parameters & less data, in ICLR, 2021. J. Pilault, A. E. hattami, and C. Pal. paper code
  19. Revisiting few-sample BERT fine-tuning, in ICLR, 2021. T. Zhang, F. Wu, A. Katiyar, K. Q. Weinberger, and Y. Artzi. paper code
  20. Few-shot conversational dense retrieval, in SIGIR, 2021. S. Yu, Z. Liu, C. Xiong, T. Feng, and Z. Liu. paper code
  21. Relational learning with gated and attentive neighbor aggregator for few-shot knowledge graph completion, in SIGIR, 2021. G. Niu, Y. Li, C. Tang, R. Geng, J. Dai, Q. Liu, H. Wang, J. Sun, F. Huang, and L. Si. paper
  22. Few-shot language coordination by modeling theory of mind, in ICML, 2021. H. Zhu, G. Neubig, and Y. Bisk. paper code
  23. Graph-evolving meta-learning for low-resource medical dialogue generation, in AAAI, 2021. S. Lin, P. Zhou, X. Liang, J. Tang, R. Zhao, Z. Chen, and L. Lin. paper
  24. KEML: A knowledge-enriched meta-learning framework for lexical relation classification, in AAAI, 2021. C. Wang, M. Qiu, J. Huang, and X. He. paper
  25. Few-shot learning for multi-label intent detection, in AAAI, 2021. Y. Hou, Y. Lai, Y. Wu, W. Che, and T. Liu. paper code
  26. SALNet: Semi-supervised few-shot text classification with attention-based lexicon construction, in AAAI, 2021. J.-H. Lee, S.-K. Ko, and Y.-S. Han. paper
  27. Learning from my friends: Few-shot personalized conversation systems via social networks, in AAAI, 2021. Z. Tian, W. Bi, Z. Zhang, D. Lee, Y. Song, and N. L. Zhang. paper code
  28. Relative and absolute location embedding for few-shot node classification on graph, in AAAI, 2021. Z. Liu, Y. Fang, C. Liu, and S. C.H. Hoi. paper
  29. Few-shot question answering by pretraining span selection, in ACL-IJCNLP, 2021. O. Ram, Y. Kirstain, J. Berant, A. Globerson, and O. Levy. paper code
  30. A closer look at few-shot crosslingual transfer: The choice of shots matters, in ACL-IJCNLP, 2021. M. Zhao, Y. Zhu, E. Shareghi, I. Vulic, R. Reichart, A. Korhonen, and H. Schütze. paper code
  31. Learning from miscellaneous other-classwords for few-shot named entity recognition, in ACL-IJCNLP, 2021. M. Tong, S. Wang, B. Xu, Y. Cao, M. Liu, L. Hou, and J. Li. paper code
  32. Distinct label representations for few-shot text classification, in ACL-IJCNLP, 2021. S. Ohashi, J. Takayama, T. Kajiwara, and Y. Arase. paper code
  33. Entity concept-enhanced few-shot relation extraction, in ACL-IJCNLP, 2021. S. Yang, Y. Zhang, G. Niu, Q. Zhao, and S. Pu. paper code
  34. On training instance selection for few-shot neural text generation, in ACL-IJCNLP, 2021. E. Chang, X. Shen, H.-S. Yeh, and V. Demberg. paper code
  35. Unsupervised neural machine translation for low-resource domains via meta-learning, in ACL-IJCNLP, 2021. C. Park, Y. Tae, T. Kim, S. Yang, M. A. Khan, L. Park, and J. Choo. paper code
  36. Meta-learning with variational semantic memory for word sense disambiguation, in ACL-IJCNLP, 2021. Y. Du, N. Holla, X. Zhen, C. Snoek, and E. Shutova. paper code
  37. Multi-label few-shot learning for aspect category detection, in ACL-IJCNLP, 2021. M. Hu, S. Z. H. Guo, C. Xue, H. Gao, T. Gao, R. Cheng, and Z. Su. paper
  38. TextSETTR: Few-shot text style extraction and tunable targeted restyling, in ACL-IJCNLP, 2021. P. Rileya, N. Constantb, M. Guob, G. Kumarc, D. Uthusb, and Z. Parekh. paper
  39. Few-shot text ranking with meta adapted synthetic weak supervision, in ACL-IJCNLP, 2021. S. Sun, Y. Qian, Z. Liu, C. Xiong, K. Zhang, J. Bao, Z. Liu, and P. Bennett. paper code
  40. PROTAUGMENT: Intent detection meta-learning through unsupervised diverse paraphrasing, in ACL-IJCNLP, 2021. T. Dopierre, C. Gravier, and W. Logerais. paper code
  41. AUGNLG: Few-shot natural language generation using self-trained data augmentation, in ACL-IJCNLP, 2021. X. Xu, G. Wang, Y.-B. Kim, and S. Lee. paper code
  42. Meta self-training for few-shot neural sequence labeling, in KDD, 2021. Y. Wang, S. Mukherjee, H. Chu, Y. Tu, M. Wu, J. Gao, and A. H. Awadallah. paper code
  43. Knowledge-enhanced domain adaptation in few-shot relation classification, in KDD, 2021. J. Zhang, J. Zhu, Y. Yang, W. Shi, C. Zhang, and H. Wang. paper code
  44. Few-shot text classification with triplet networks, data augmentation, and curriculum learning, in NAACL-HLT, 2021. J. Wei, C. Huang, S. Vosoughi, Y. Cheng, and S. Xu. paper code
  45. Few-shot intent classification and slot filling with retrieved examples, in NAACL-HLT, 2021. D. Yu, L. He, Y. Zhang, X. Du, P. Pasupat, and Q. Li. paper
  46. Non-parametric few-shot learning for word sense disambiguation, in NAACL-HLT, 2021. H. Chen, M. Xia, and D. Chen. paper code
  47. Towards few-shot fact-checking via perplexity, in NAACL-HLT, 2021. N. Lee, Y. Bang, A. Madotto, and P. Fung. paper
  48. ConVEx: Data-efficient and few-shot slot labeling, in NAACL-HLT, 2021. M. Henderson, and I. Vulic. paper
  49. Few-shot text generation with natural language instructions, in EMNLP, 2021. T. Schick, and H. Schütze. paper
  50. Towards realistic few-shot relation extraction, in EMNLP, 2021. S. Brody, S. Wu, and A. Benton. paper code
  51. Few-shot emotion recognition in conversation with sequential prototypical networks, in EMNLP, 2021. G. Guibon, M. Labeau, H. Flamein, L. Lefeuvre, and C. Clavel. paper code
  52. Learning prototype representations across few-shot tasks for event detection, in EMNLP, 2021. V. Lai, F. Dernoncourt, and T. H. Nguyen. paper
  53. Exploring task difficulty for few-shot relation extraction, in EMNLP, 2021. J. Han, B. Cheng, and W. Lu. paper code
  54. Honey or poison? Solving the trigger curse in few-shot event detection via causal intervention, in EMNLP, 2021. J. Chen, H. Lin, X. Han, and L. Sun. paper code
  55. Nearest neighbour few-shot learning for cross-lingual classification, in EMNLP, 2021. M. S. Bari, B. Haider, and S. Mansour. paper
  56. Knowledge-aware meta-learning for low-resource text classification, in EMNLP, 2021. H. Yao, Y. Wu, M. Al-Shedivat, and E. P. Xing. paper code
  57. Few-shot named entity recognition: An empirical baseline study, in EMNLP, 2021. J. Huang, C. Li, K. Subudhi, D. Jose, S. Balakrishnan, W. Chen, B. Peng, J. Gao, and J. Han. paper
  58. MetaTS: Meta teacher-student network for multilingual sequence labeling with minimal supervision, in EMNLP, 2021. Z. Li, D. Zhang, T. Cao, Y. Wei, Y. Song, and B. Yin. paper
  59. Meta-LMTC: Meta-learning for large-scale multi-label text classification, in EMNLP, 2021. R. Wang, X. Su, S. Long, X. Dai, S. Huang, and J. Chen. paper

继承

1 继承

子没有重写,则继承父

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class A:
x=1
class B(A):
pass
class C(A):
pass
B.x=2
print(A.x,B.x,C.x)
A.x=3
print(A.x,B.x,C.x)




1 2 1
3 2 3

2 super

https://blog.csdn.net/weixin_40734030/article/details/122861895

目的:使得子类初始化的时候调用父类的init

例子:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class test1:
def __init__(self):
self.a=1

class test2(test1):
def __init__(self):
super(test2, self).__init__()
self.b=2

tt=test2()
# print(tt.a)
print(tt.b)
print(tt.a)


2
1
############################
class test1:
def __init__(self):
self.a=1

class test2(test1):
def __init__(self):
# super(test2, self).__init__()
self.b=2

tt=test2()
# print(tt.a)
print(tt.b)
print(tt.a)

2
AttributeError: 'test2' object has no attribute 'a'
1
2
3
4
5

class pointwise_hybird_contrasive(hybird):
def __init__(self,config_roberta, path,num):
super(pointwise_hybird_contrasive, self).__init__(config_roberta, path,num)
super(pointwise_hybird_contrasive, self).\__init\__(config_roberta, path,num)就是对父类hybird的属性进行初始化
  

pytorch常见操作

1 pytorch中对tensor操作

https://blog.csdn.net/HailinPan/article/details/109818774

2 模型加载

1 model.load_state_dict(torch.load(path))

2 model=BertModel.from_pretrained

后者的底层为前者

用法不同,前者model为一个对象,然后用load_state_dict加载权重;后者BertModel为一个类,然后用from_pretrained创建对象并加载权重

huggingface

NLP小帮手,huggingface的transformer

git: https://github.com/huggingface/transformers

paper: https://arxiv.org/abs/1910.03771v5

整体结构

简单教程:

https://blog.csdn.net/weixin_44614687/article/details/106800244

from_pretrained

底层为load_state_dict

1
2
3
4
5
6
7
8
9
10
Some weights of the model checkpoint at ../../../../test/data/chinese-roberta-wwm-ext were not used when initializing listnet_bert: ['cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing listnet_bert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing listnet_bert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of listnet_bert were not initialized from the model checkpoint at ../../../../test/data/chinese-roberta-wwm-ext and are newly initialized: ['Linear2.weight', 'Linear1.weight', 'Linear1.bias', 'Linear2.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


两部分:1 加载的预训练模型中有参数没有用到 2 自己的模型有参数没有初始化
finetune的时候报这个 很正常
predict的时候应该不会有

关于model

BertModel -> our model

1 加载transformers中的模型

1
from transformers import BertPreTrainedModel, BertModel,AutoTokenizer,AutoConfig

2 基于1中的模型搭建自己的结构


:D 一言句子获取中...