2022-01-17 ea4ea182972bcbaca785acaaee9aa0a3 99+ fast 0.1 k

Prompt-learning小帮手-openprompt

清华NLP实验室推出OpenPrompt开源工具包

1 结构

2 教程

可以参考官方https://hub.fastgit.xyz/thunlp/OpenPrompt

有详细的步骤和case

参考

https://hub.fastgit.xyz/thunlp/OpenPrompt

https://zhuanlan.zhihu.com/p/420335724

https://github.com/thunlp/OpenPrompt

NLP 小帮手

Prompt-learning小帮手-openprompt

2021-12-21 271c7cdc6f50e6497130d81a42dcbf61 99+ 2 m 0.3 k

Bert系列之句向量生成

https://zhuanlan.zhihu.com/p/444346578

1 sentence-bert

sts任务，数据分为sts无标签数据，sts有标签数据，nli有标签

无监督,有监督loss一样，文中有3种loss，区别在于数据集

无监督:nli有标签;有监督:sts有标签数据

2 simcse

sts任务，数据分为sts无标签数据，sts有标签数据

无监督，有监督区别在于：样本构造不同

无监督样本正负来源于sts无标签数据数据增强，有监督样本正负来源于sts有标签数据

3 consert

sts任务，数据分为sts无标签数据，sts有标签数据，还有nli数据集（有标签）

相同

和simcse相同之处：都是在finetune引入对比

不同

1 无监督

和simces loss一样为NT-Xent，不同在于sts无标签数据数据增强方式不同

2 有监督

区别在于loss和数据源

simcse loss为NT-Xent，数据源为sts有标签数据

consert loss为 NT-Xent + 别的有监督loss（比如cross entropy），数据源为sts无标签数据和nli数据集（有标签），+表示融合，论文有3种融合方式

NLP 文本表示

Bert文本表示

2021-12-14 ebdcdc87ea1e77ac7d009f72b7eb9ef1 99+ 2 m 0.3 k

Enhanced-RCNN An Efficient Method for Learning Sentence Similarity

特点：非预训练，参数量少

1 input encoding

得到两个encoding，RNN Encoding，RCNN Encoding

1 BiGRU

$\textbf{a}=\{a_1,a_2,…,a_{l_a}\},\textbf{a}$ 是句子，$l_a$ 是句子1的长度

得到RNN Encoding，$\overline{\textbf{p}}_i$统一表示$\overline{\textbf{a}}_i,\overline{\textbf{b}}_i$

2 CNN

在 BiGRU 编码的基础上，使用 CNN 来进行二次编码

结构如下，“newtork in network”,k 是卷积核的kernel size，比如k=1,卷积核为$1 \times 1$

对于每个 CNN 单元，具体的计算过程如下:

得到 RCNN Encoding $\widetilde{\textbf{p}}_i$

2 Interactive Sentence Representation

1 Soft-attention Alignment

attention：

加了attention的rnn encoding：

2 Interaction Modeling

$\overline{\textbf{p}}$是rnn encoding

$\hat{}$是加了attention的rnn encoding

$\widetilde{}$是rcnn encoding

最终得到Interactive Sentence Representation为$\textbf{o}_a,\textbf{o}_b$

3 Similarity Modeling

1 Fusion Layer

g是门控函数

2 Label Prediction

全连接层

4 loss

交叉熵

参考

https://sci-hub.st/10.1145/3366423.3379998

https://zhuanlan.zhihu.com/p/138061003

NLP 文本匹配

Enhanced-RCNN

2021-12-12 1a6cb8f58eadb9f20eae53a070af5922 99+ 5 m 0.8 k

Pre-train, Prompt, and Predict A Systematic Survey of Prompting Methods in Natural Language Processing

0 和pre-train，finetune区别

prompt感觉是一种特殊的finetune方式，还是先pre-train然后prompt tuning

目的：prompt narrowing the gap between pre-training and fine-tuning

1 怎么做

3步

1 Prompt Addition

$x^{‘}=f_{prompt}(x)$ x是input text

Apply a template, which is a textual string that has two slots: an input slot [X] for input x and an answer slot
[Z] for an intermediate generated answer text z that will later be mapped into y.
Fill slot [X] with the input text x.

2 Answer Search

f：fills in the location [Z] in prompt $x^{‘}$ with the potential answer z

Z：a set of permissible values for z

3 Answer Mapping

因为上面的 $\hat{z}$ 还不是 $\hat{y}$，比如情感分析，“excellent”, “fabulous”, “wonderful” -》positive

go from the highest-scoring answer $\hat{z}$ to the highest-scoring output $\hat{y}$

4 举个例子，文本情感分类的任务

原来

“ I love this movie.” -》 positive

现在

1 $x=$ “ I love this movie.” -》模板为： “ [x] Overall, it was a [z] movie.” -》$x^{‘}$为”I love this movie. Overall ,it was a [z] movie.”

2 下一步会进行答案搜索，顾名思义就是LM寻找填在[z] 处可以使得分数最高的文本 $\hat{z}$(比如”excellent”, “great”, “wonderful” )

3 最后是答案映射。有时LM填充的文本并非任务需要的最终形式(最终为positive，上述为”excellent”, “great”, “wonderful”)，因此要将此文本映射到最终的输出$\hat{y}$

2 Prompt方法分类

3 Prompt Engineering

1 one must first consider the prompt shape,

2 then decide whether to take a manual or automated approach to create prompts of the desired shape

1 Prompt Shape

Prompt的形状主要指的是[X]和[Z]的位置和数量。

如果在句中，一般称这种prompt为cloze prompt；如果在句末，一般称这种prompt为prefix prompt。

在实际应用过程中选择哪一种主要取决于任务的形式和模型的类别。cloze prompts和Masked Language Model的训练方式非常类似，因此对于使用MLM的任务来说cloze prompts更加合适；对于生成任务来说，或者使用自回归LM解决的任务，prefix prompts就会更加合适；Full text reconstruction models较为通用，因此两种prompt均适用。另外，对于文本对的分类，prompt模板通常要给输入预留两个空，[x1]和[x2]。

2 create prompts

1 Manual Template Engineering

2 Automated Template Learning

1 Discrete Prompts

the prompt 作用在文本上

D1: Prompt Mining

D2: Prompt Paraphrasing

D3: Gradient-based Search

D4: Prompt Generation

D5: Prompt Scoring

2 Continuous Prompts

the prompt 直接作用到模型的embedding空间

C1: Prefix Tuning

C2: Tuning Initialized with Discrete Prompts

C3: Hard-Soft Prompt Hybrid Tuning

4 Answer Engineering

two dimensions that must be considered when performing answer
engineering:1 deciding the answer shape and 2 choosing an answer design method.

1 Answer Shape

和Prompt Shape啥区别？？？

2 Answer Space Design Methods

1 Manual Design

2 automatic automatic

1 Discrete Answer Search

2 Continuous Answer Search

5 Multi-Prompt Learning

之前在讨论single prompt，现在介绍multiple prompts

6 Training Strategies for Prompting Methods

1 Training Settings

full-data

few-shot /zero-shot

2 Parameter Update Methods

参考

https://arxiv.org/abs/2107.13586

刘鹏飞博士 https://zhuanlan.zhihu.com/p/395115779

https://zhuanlan.zhihu.com/p/399295895

https://zhuanlan.zhihu.com/p/440169921

https://zhuanlan.zhihu.com/p/399295895

NLP Prompt

Prompt

2021-12-12 27305810d8db04e9529bfb0657ccb1e8 99+ 14 m 2.2 k

Generalizing from a Few Examples A Survey on Few-Shot Learning

paper： https://arxiv.org/abs/1904.05046

git: https://github.com/tata1661/FSL-Mate/tree/master/FewShotPapers#Applications

原文按应用对FSL做了总结，与NLP相关的有：

High-risk learning: Acquiring new word vectors from tiny data, in EMNLP, 2017. A. Herbelot and M. Baroni. paper
MetaEXP: Interactive explanation and exploration of large knowledge graphs, in TheWebConf, 2018. F. Behrens, S. Bischoff, P. Ladenburger, J. Rückin, L. Seidel, F. Stolp, M. Vaichenker, A. Ziegler, D. Mottin, F. Aghaei, E. Müller, M. Preusse, N. Müller, and M. Hunger. paper code
Few-shot representation learning for out-of-vocabulary words, in ACL, 2019. Z. Hu, T. Chen, K.-W. Chang, and Y. Sun. paper
Learning to customize model structures for few-shot dialogue generation tasks, in ACL, 2020. Y. Song, Z. Liu, W. Bi, R. Yan, and M. Zhang. paper
Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network, in ACL, 2020. Y. Hou, W. Che, Y. Lai, Z. Zhou, Y. Liu, H. Liu, and T. Liu. paper
Meta-reinforced multi-domain state generator for dialogue systems, in ACL, 2020. Y. Huang, J. Feng, M. Hu, X. Wu, X. Du, and S. Ma. paper
Few-shot knowledge graph completion, in AAAI, 2020. C. Zhang, H. Yao, C. Huang, M. Jiang, Z. Li, and N. V. Chawla. paper
Universal natural language processing with limited annotations: Try few-shot textual entailment as a start, in EMNLP, 2020. W. Yin, N. F. Rajani, D. Radev, R. Socher, and C. Xiong. paper code
Simple and effective few-shot named entity recognition with structured nearest neighbor learning, in EMNLP, 2020. Y. Yang, and A. Katiyar. paper code
Discriminative nearest neighbor few-shot intent detection by transferring natural language inference, in EMNLP, 2020. J. Zhang, K. Hashimoto, W. Liu, C. Wu, Y. Wan, P. Yu, R. Socher, and C. Xiong. paper code
Few-shot learning for opinion summarization, in EMNLP, 2020. A. Bražinskas, M. Lapata, and I. Titov. paper code
Adaptive attentional network for few-shot knowledge graph completion, in EMNLP, 2020. J. Sheng, S. Guo, Z. Chen, J. Yue, L. Wang, T. Liu, and H. Xu. paper code
Few-shot complex knowledge base question answering via meta reinforcement learning, in EMNLP, 2020. Y. Hua, Y. Li, G. Haffari, G. Qi, and T. Wu. paper code
Self-supervised meta-learning for few-shot natural language classification tasks, in EMNLP, 2020. T. Bansal, R. Jha, T. Munkhdalai, and A. McCallum. paper code
Uncertainty-aware self-training for few-shot text classification, in NeurIPS, 2020. S. Mukherjee, and A. Awadallah. paper code
Learning to extrapolate knowledge: Transductive few-shot out-of-graph link prediction, in NeurIPS, 2020:. J. Baek, D. B. Lee, and S. J. Hwang. paper code
MetaNER: Named entity recognition with meta-learning, in TheWebConf, 2020. J. Li, S. Shang, and L. Shao. paper
Conditionally adaptive multi-task learning: Improving transfer learning in NLP using fewer parameters & less data, in ICLR, 2021. J. Pilault, A. E. hattami, and C. Pal. paper code
Revisiting few-sample BERT fine-tuning, in ICLR, 2021. T. Zhang, F. Wu, A. Katiyar, K. Q. Weinberger, and Y. Artzi. paper code
Few-shot conversational dense retrieval, in SIGIR, 2021. S. Yu, Z. Liu, C. Xiong, T. Feng, and Z. Liu. paper code
Relational learning with gated and attentive neighbor aggregator for few-shot knowledge graph completion, in SIGIR, 2021. G. Niu, Y. Li, C. Tang, R. Geng, J. Dai, Q. Liu, H. Wang, J. Sun, F. Huang, and L. Si. paper
Few-shot language coordination by modeling theory of mind, in ICML, 2021. H. Zhu, G. Neubig, and Y. Bisk. paper code
Graph-evolving meta-learning for low-resource medical dialogue generation, in AAAI, 2021. S. Lin, P. Zhou, X. Liang, J. Tang, R. Zhao, Z. Chen, and L. Lin. paper
KEML: A knowledge-enriched meta-learning framework for lexical relation classification, in AAAI, 2021. C. Wang, M. Qiu, J. Huang, and X. He. paper
Few-shot learning for multi-label intent detection, in AAAI, 2021. Y. Hou, Y. Lai, Y. Wu, W. Che, and T. Liu. paper code
SALNet: Semi-supervised few-shot text classification with attention-based lexicon construction, in AAAI, 2021. J.-H. Lee, S.-K. Ko, and Y.-S. Han. paper
Learning from my friends: Few-shot personalized conversation systems via social networks, in AAAI, 2021. Z. Tian, W. Bi, Z. Zhang, D. Lee, Y. Song, and N. L. Zhang. paper code
Relative and absolute location embedding for few-shot node classification on graph, in AAAI, 2021. Z. Liu, Y. Fang, C. Liu, and S. C.H. Hoi. paper
Few-shot question answering by pretraining span selection, in ACL-IJCNLP, 2021. O. Ram, Y. Kirstain, J. Berant, A. Globerson, and O. Levy. paper code
A closer look at few-shot crosslingual transfer: The choice of shots matters, in ACL-IJCNLP, 2021. M. Zhao, Y. Zhu, E. Shareghi, I. Vulic, R. Reichart, A. Korhonen, and H. Schütze. paper code
Learning from miscellaneous other-classwords for few-shot named entity recognition, in ACL-IJCNLP, 2021. M. Tong, S. Wang, B. Xu, Y. Cao, M. Liu, L. Hou, and J. Li. paper code
Distinct label representations for few-shot text classification, in ACL-IJCNLP, 2021. S. Ohashi, J. Takayama, T. Kajiwara, and Y. Arase. paper code
Entity concept-enhanced few-shot relation extraction, in ACL-IJCNLP, 2021. S. Yang, Y. Zhang, G. Niu, Q. Zhao, and S. Pu. paper code
On training instance selection for few-shot neural text generation, in ACL-IJCNLP, 2021. E. Chang, X. Shen, H.-S. Yeh, and V. Demberg. paper code
Unsupervised neural machine translation for low-resource domains via meta-learning, in ACL-IJCNLP, 2021. C. Park, Y. Tae, T. Kim, S. Yang, M. A. Khan, L. Park, and J. Choo. paper code
Meta-learning with variational semantic memory for word sense disambiguation, in ACL-IJCNLP, 2021. Y. Du, N. Holla, X. Zhen, C. Snoek, and E. Shutova. paper code
Multi-label few-shot learning for aspect category detection, in ACL-IJCNLP, 2021. M. Hu, S. Z. H. Guo, C. Xue, H. Gao, T. Gao, R. Cheng, and Z. Su. paper
TextSETTR: Few-shot text style extraction and tunable targeted restyling, in ACL-IJCNLP, 2021. P. Rileya, N. Constantb, M. Guob, G. Kumarc, D. Uthusb, and Z. Parekh. paper
Few-shot text ranking with meta adapted synthetic weak supervision, in ACL-IJCNLP, 2021. S. Sun, Y. Qian, Z. Liu, C. Xiong, K. Zhang, J. Bao, Z. Liu, and P. Bennett. paper code
PROTAUGMENT: Intent detection meta-learning through unsupervised diverse paraphrasing, in ACL-IJCNLP, 2021. T. Dopierre, C. Gravier, and W. Logerais. paper code
AUGNLG: Few-shot natural language generation using self-trained data augmentation, in ACL-IJCNLP, 2021. X. Xu, G. Wang, Y.-B. Kim, and S. Lee. paper code
Meta self-training for few-shot neural sequence labeling, in KDD, 2021. Y. Wang, S. Mukherjee, H. Chu, Y. Tu, M. Wu, J. Gao, and A. H. Awadallah. paper code
Knowledge-enhanced domain adaptation in few-shot relation classification, in KDD, 2021. J. Zhang, J. Zhu, Y. Yang, W. Shi, C. Zhang, and H. Wang. paper code
Few-shot text classification with triplet networks, data augmentation, and curriculum learning, in NAACL-HLT, 2021. J. Wei, C. Huang, S. Vosoughi, Y. Cheng, and S. Xu. paper code
Few-shot intent classification and slot filling with retrieved examples, in NAACL-HLT, 2021. D. Yu, L. He, Y. Zhang, X. Du, P. Pasupat, and Q. Li. paper
Non-parametric few-shot learning for word sense disambiguation, in NAACL-HLT, 2021. H. Chen, M. Xia, and D. Chen. paper code
Towards few-shot fact-checking via perplexity, in NAACL-HLT, 2021. N. Lee, Y. Bang, A. Madotto, and P. Fung. paper
ConVEx: Data-efficient and few-shot slot labeling, in NAACL-HLT, 2021. M. Henderson, and I. Vulic. paper
Few-shot text generation with natural language instructions, in EMNLP, 2021. T. Schick, and H. Schütze. paper
Towards realistic few-shot relation extraction, in EMNLP, 2021. S. Brody, S. Wu, and A. Benton. paper code
Few-shot emotion recognition in conversation with sequential prototypical networks, in EMNLP, 2021. G. Guibon, M. Labeau, H. Flamein, L. Lefeuvre, and C. Clavel. paper code
Learning prototype representations across few-shot tasks for event detection, in EMNLP, 2021. V. Lai, F. Dernoncourt, and T. H. Nguyen. paper
Exploring task difficulty for few-shot relation extraction, in EMNLP, 2021. J. Han, B. Cheng, and W. Lu. paper code
Honey or poison? Solving the trigger curse in few-shot event detection via causal intervention, in EMNLP, 2021. J. Chen, H. Lin, X. Han, and L. Sun. paper code
Nearest neighbour few-shot learning for cross-lingual classification, in EMNLP, 2021. M. S. Bari, B. Haider, and S. Mansour. paper
Knowledge-aware meta-learning for low-resource text classification, in EMNLP, 2021. H. Yao, Y. Wu, M. Al-Shedivat, and E. P. Xing. paper code
Few-shot named entity recognition: An empirical baseline study, in EMNLP, 2021. J. Huang, C. Li, K. Subudhi, D. Jose, S. Balakrishnan, W. Chen, B. Peng, J. Gao, and J. Han. paper
MetaTS: Meta teacher-student network for multilingual sequence labeling with minimal supervision, in EMNLP, 2021. Z. Li, D. Zhang, T. Cao, Y. Wei, Y. Song, and B. Yin. paper
Meta-LMTC: Meta-learning for large-scale multi-label text classification, in EMNLP, 2021. R. Wang, X. Su, S. Long, X. Dai, S. Huang, and J. Chen. paper

NLP 小样本

小样本

2021-12-09 d811ccb886d4564a5d237abcea635e61 99+ fast 0.0 k

AutoTokenizer和BertTokenizer区别

https://github.com/huggingface/transformers/issues/5587

NLP 小帮手

AutoTokenizer

2021-12-07 9432b49a79086904a621b6c0f4f85c94 99+ 2 m 0.3 k

huggingface

NLP小帮手，huggingface的transformer

git： https://github.com/huggingface/transformers

paper： https://arxiv.org/abs/1910.03771v5

整体结构

简单教程：

https://blog.csdn.net/weixin_44614687/article/details/106800244

from_pretrained

底层为load_state_dict

Some weights of the model checkpoint at ../../../../test/data/chinese-roberta-wwm-ext were not used when initializing listnet_bert: ['cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing listnet_bert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing listnet_bert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of listnet_bert were not initialized from the model checkpoint at ../../../../test/data/chinese-roberta-wwm-ext and are newly initialized: ['Linear2.weight', 'Linear1.weight', 'Linear1.bias', 'Linear2.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


两部分：1 加载的预训练模型中有参数没有用到  2 自己的模型有参数没有初始化
finetune的时候报这个 很正常
predict的时候应该不会有

关于model

BertModel -> our model

1 加载transformers中的模型

1	from transformers import BertPreTrainedModel, BertModel,AutoTokenizer,AutoConfig

2 基于1中的模型搭建自己的结构

NLP 小帮手

huggingface

2021-12-06 3d8bcf90c1ebc82a648361c567d855a6 99+ 3 m 0.4 k

Transformer-XL Attentive Language Models Beyond a Fixed-Length Context

https://arxiv.org/abs/1901.02860

Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling ( memory and computation受限，长度不可能很大 ). propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence.

3 Model

3.1 Vanilla Transformer Language Models

问题：In order to apply Transformer or self-attention to language modeling, the central problem is how to train a Transformer to effectively encode an arbitrarily long context into a fixed size representation.通常做法为vanilla model。 vanilla model就是说把长文本分隔成固定长度的seg来处理，如上图。

During training，There are two critical limitations of using a fixed length context. First, the largest possible dependency length is upper bounded by the segment length. Second. simply chunking a sequence into fixed-length segments will lead to the context fragmentation problem

During evaluation, As shown in Fig. 1b, this procedure ensures that each prediction utilizes the longest possible context exposed during training, and also relieves context fragmentation issue encountered in training. However, this evaluation procedure is extremely expensive.

3.2 Segment-Level Recurrence with State Reuse

introduce a recurrence mechanism to the Transformer architecture.

定义变量

转换过程

SG(） stands for stop-gradient，$\circ$ 表示矩阵拼接

具体过程如下图

During training, the hidden state sequence computed for the previous segment is fixed and cached to be reused as an extended context when the model processes the next new segment, as shown in Fig. 2a.

during evaluation, the representations from the previous segments can be reused instead of being computed from scratch as in the case of the vanilla model.

3.3 Relative Positional Encodings

how can we keep the positional information coherent when we reuse the states? 如果保留原来的位置编码形式，可以得到如下

这种方式存在问题：

为了解决这个问题提出了relative positional information。

standard Transformer

we propose

3.4 完整算法流程

NLP 特征提取器

Transformer-XL

2021-12-02 3583958f6b3e83c8267c2cca0140962f 99+ a minute 0.1 k

Bridging the Gap Between Relevance Matching and Semantic Matching for Short Text Similarity Modeling

https://cs.uwaterloo.ca/~jimmylin/publications/Rao_etal_EMNLP2019.pdf

2 HCAN: Hybrid Co-Attention Network

three major components: (1) a hybrid encoder (2) a relevance matching module (3) a semantic matching module

2.1 Hybrid Encoders

hybrid encoder module that explores three types of encoders: deep, wide, and contextual

query and context words :$\{w_1^q,w_2^q,…,w_n^q\},\{w_1^c,w_2^c,…,w_m^c\}$, embedding representations $\textbf{Q}\in \mathbb{R}^{n\times L},\textbf{C}\in \mathbb{R}^{m\times L}$

Deep Encoder

$\textbf{U}$表示$\textbf{Q},\textbf{C}$

Wide Encoder

Unlike the deep encoder that stacks multiple convolutional layers hierarchically, the wide encoder organizes convolutional layers in parallel, with each convolutional layer having a different window size k

Contextual Encoder