bertviz:attention可视化工具

看不同layer,不同head的attention

注意:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from bertviz.neuron_view import show
from bertviz.transformers_neuron_view import BertModel, BertTokenizer
model1=BertModel.from_pretrained(path)
model_type = 'bert'

show(model1, model_type, tokenizer, sentence_a, sentence_b, layer=4, head=3)
可以
###########################

from bertviz.neuron_view import show
from transformers import BertTokenizer, BertModel
model1=BertModel.from_pretrained(path)
model_type = 'bert'

show(model1, model_type, tokenizer, sentence_a, sentence_b, layer=4, head=3)
报错

参考

https://zhuanlan.zhihu.com/p/457043243

prompt trick

目的

通过模板使得预测任务与预训练模型的训练任务相统一,拉近预训练任务目标与下游微调目标的差距

和finetune差异

finetune:PTM向下兼容specific task

prompt:specific task向上兼容PTM

应用场景

由于其当前预测任务与预训练模型的训练任务相统一,所以我们可以在训练数据较少,甚至没有的情况下去完成当前任务,总结一下,其比较适合的应用场景:

  1. zero-shot
  2. few-shot
  3. 冷启动

参考

https://zhuanlan.zhihu.com/p/424888379

https://zhuanlan.zhihu.com/p/440169921

jupyter notebook

1 部署

https://blog.csdn.net/weixin_41149572/article/details/114640624

查看密码

取消密码

2 传参

py:args=parser.pars_args()

jupyter:args=parser.pars_args(argew=[传入参数])

3 指定python 版本

env/bin/jupyter notebook

4 文件调用

调用.py

调用.ipyb

5 运行py

加载 %load path/xx.py

运行%run path/xx.py

参考

https://blog.csdn.net/weixin_41149572/article/details/114640624

ChineseBERT Chinese Pretraining Enhanced by Glyph and Pinyin Information

考虑字形和拼音的中文PTM

1 模型结构

变动在bert的输入

原来Char embedding+Position embedding+segment embedding-> 现在 Fusion embedding+Position embedding (omit the segment embedding)

Char embedding +Glyph ( 字形 ) embedding +Pinyin (拼音)embedding -》Fusion embedding

2 预训练任务

Whole Word Masking (WWM) and Char Masking (CM)

3 使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
>>> from datasets.bert_dataset import BertDataset
>>> from models.modeling_glycebert import GlyceBertModel

>>> tokenizer = BertDataset([CHINESEBERT_PATH])
>>> chinese_bert = GlyceBertModel.from_pretrained([CHINESEBERT_PATH])
>>> sentence = '我喜欢猫'

>>> input_ids, pinyin_ids = tokenizer.tokenize_sentence(sentence)
>>> length = input_ids.shape[0]
>>> input_ids = input_ids.view(1, length)
>>> pinyin_ids = pinyin_ids.view(1, length, 8)
>>> output_hidden = chinese_bert.forward(input_ids, pinyin_ids)[0]
>>> print(output_hidden)
tensor([[[ 0.0287, -0.0126, 0.0389, ..., 0.0228, -0.0677, -0.1519],
[ 0.0144, -0.2494, -0.1853, ..., 0.0673, 0.0424, -0.1074],
[ 0.0839, -0.2989, -0.2421, ..., 0.0454, -0.1474, -0.1736],
[-0.0499, -0.2983, -0.1604, ..., -0.0550, -0.1863, 0.0226],
[ 0.1428, -0.0682, -0.1310, ..., -0.1126, 0.0440, -0.1782],
[ 0.0287, -0.0126, 0.0389, ..., 0.0228, -0.0677, -0.1519]]],
grad_fn=<NativeLayerNormBackward>)

参考

https://github.com/ShannonAI/ChineseBert

https://arxiv.org/pdf/2106.16038.pdf

finetune

1 使用哪些层参与下游任务

使用哪些层参与下游任务

选择的层model1+下游任务model2

对于深度模型的不同层,捕获的知识是不同的,比如说词性标注,句法分析,长期依赖,语义角色,协同引用。对于RNN based的模型,研究表明多层的LSTM编码器的不同层对于不同任务的表现不一样。对于transformer based 的模型,基本的句法理解在网络的浅层出现,然而高级的语义理解在深层出现。

用$\textbf{H}^{l}(1<=l<=L)$表示PTM的第$l$层的representation,$g(\cdot)$为特定的任务模型。有以下几种方法选择representation:

a) Embedding Only

choose only the pre-trained static embeddings,即$g(\textbf{H}^{1})$

b) Top Layer

选择顶层的representation,然后接入特定的任务模型,即$g(\textbf{H}^{L})$

c) All Layers

输入全部层的representation,让模型自动选择最合适的层次,然后接入特定的任务模型,比如ELMo,式子如下

其中$\alpha$ is the softmax-normalized weight for layer $l$ and $\gamma$ is a scalar to scale the vectors output by pre-trained model

2 参数是否固定

总共有两种常用的模型迁移方式:feature extraction (where the pre-trained parameters are frozen), and fine-tuning (where the pre-trained parameters are unfrozen and fine-tuned).

3 Fine-Tuning Strategies

Two-stage fine-tuning

第一阶段为中间任务,第二阶段为目标任务

Multi-task fine-tuning

multi-task learning and pre-training are complementary technologies.

Fine-tuning with extra adaptation modules

The main drawback of fine-tuning is its parameter ineffciency: every downstream task has its own fine-tuned parameters. Therefore, a better solution is to inject some fine-tunable adaptation modules into PTMs while the original parameters are fixed.

Others

self-ensemble ,self-distillation,gradual unfreezing,sequential unfreezing

参考

https://arxiv.org/pdf/2003.08271v4.pdf

 NLP PTM
  

:D 一言句子获取中...