from bertviz.neuron_view import show
from bertviz.transformers_neuron_view import BertModel, BertTokenizer
model1=BertModel.from_pretrained(path)
model_type = 'bert'

show(model1, model_type, tokenizer, sentence_a, sentence_b, layer=4, head=3)
可以
###########################

from bertviz.neuron_view import show
from transformers import BertTokenizer, BertModel
model1=BertModel.from_pretrained(path)
model_type = 'bert'

show(model1, model_type, tokenizer, sentence_a, sentence_b, layer=4, head=3)
报错

参考

https://zhuanlan.zhihu.com/p/457043243

NLP 小帮手

bertviz:attention可视化工具

2022-06-27 063f51d98cf606d468140d756a8c2a53 99+ a minute 0.2 k

prompt trick

目的

通过模板使得预测任务与预训练模型的训练任务相统一，拉近预训练任务目标与下游微调目标的差距

和finetune差异

finetune：PTM向下兼容specific task

prompt：specific task向上兼容PTM

应用场景

由于其当前预测任务与预训练模型的训练任务相统一，所以我们可以在训练数据较少，甚至没有的情况下去完成当前任务，总结一下，其比较适合的应用场景：

zero-shot
few-shot
冷启动

参考

https://zhuanlan.zhihu.com/p/424888379

https://zhuanlan.zhihu.com/p/440169921

NLP Prompt

prompt trick

2022-06-17 bda1c146ae84c6c61553927c0c0f34f6 99+ a minute 0.2 k

ChineseBERT Chinese Pretraining Enhanced by Glyph and Pinyin Information

考虑字形和拼音的中文PTM

1 模型结构

变动在bert的输入

原来Char embedding+Position embedding+segment embedding-> 现在 Fusion embedding+Position embedding （omit the segment embedding）

Char embedding +Glyph ( 字形 ) embedding +Pinyin （拼音）embedding -》Fusion embedding

2 预训练任务

Whole Word Masking (WWM) and Char Masking (CM)

3 使用

>>> from datasets.bert_dataset import BertDataset
>>> from models.modeling_glycebert import GlyceBertModel

>>> tokenizer = BertDataset([CHINESEBERT_PATH])
>>> chinese_bert = GlyceBertModel.from_pretrained([CHINESEBERT_PATH])
>>> sentence = '我喜欢猫'

>>> input_ids, pinyin_ids = tokenizer.tokenize_sentence(sentence)
>>> length = input_ids.shape[0]
>>> input_ids = input_ids.view(1, length)
>>> pinyin_ids = pinyin_ids.view(1, length, 8)
>>> output_hidden = chinese_bert.forward(input_ids, pinyin_ids)[0]
>>> print(output_hidden)
tensor([[[ 0.0287, -0.0126,  0.0389,  ...,  0.0228, -0.0677, -0.1519],
         [ 0.0144, -0.2494, -0.1853,  ...,  0.0673,  0.0424, -0.1074],
         [ 0.0839, -0.2989, -0.2421,  ...,  0.0454, -0.1474, -0.1736],
         [-0.0499, -0.2983, -0.1604,  ..., -0.0550, -0.1863,  0.0226],
         [ 0.1428, -0.0682, -0.1310,  ..., -0.1126,  0.0440, -0.1782],
         [ 0.0287, -0.0126,  0.0389,  ...,  0.0228, -0.0677, -0.1519]]],
       grad_fn=<NativeLayerNormBackward>)

参考

https://github.com/ShannonAI/ChineseBert

https://arxiv.org/pdf/2106.16038.pdf

NLP PTM

ChineseBERT Chinese Pretraining Enhanced by Glyph and Pinyin Information

2022-06-11 ff2ed5a2fbb088083a33f8c473f3db69 99+ 3 m 0.5 k

finetune

1 使用哪些层参与下游任务

使用哪些层参与下游任务

选择的层model1+下游任务model2

对于深度模型的不同层，捕获的知识是不同的，比如说词性标注，句法分析，长期依赖，语义角色，协同引用。对于RNN based的模型，研究表明多层的LSTM编码器的不同层对于不同任务的表现不一样。对于transformer based 的模型，基本的句法理解在网络的浅层出现，然而高级的语义理解在深层出现。

用$\textbf{H}^{l}(1<=l<=L)$表示PTM的第$l$层的representation，$g(\cdot)$为特定的任务模型。有以下几种方法选择representation:

a) Embedding Only

choose only the pre-trained static embeddings，即$g(\textbf{H}^{1})$

b) Top Layer

选择顶层的representation，然后接入特定的任务模型，即$g(\textbf{H}^{L})$

c) All Layers

输入全部层的representation，让模型自动选择最合适的层次，然后接入特定的任务模型，比如ELMo，式子如下

$g(\textbf{r}_t)=g(\gamma \sum_{l=1}^{L}\alpha_l\textbf{H}^{(l)})$

其中$\alpha$ is the softmax-normalized weight for layer $l$ and $\gamma$ is a scalar to scale the vectors output by pre-trained model

2 参数是否固定

总共有两种常用的模型迁移方式：feature extraction (where the pre-trained parameters are frozen), and fine-tuning (where the pre-trained parameters are unfrozen and fine-tuned).

3 Fine-Tuning Strategies

Two-stage fine-tuning

第一阶段为中间任务，第二阶段为目标任务

Multi-task fine-tuning

multi-task learning and pre-training are complementary technologies.

Fine-tuning with extra adaptation modules

The main drawback of fine-tuning is its parameter ineffciency: every downstream task has its own fine-tuned parameters. Therefore, a better solution is to inject some fine-tunable adaptation modules into PTMs while the original parameters are fixed.

Others

self-ensemble ，self-distillation，gradual unfreezing，sequential unfreezing

参考

https://arxiv.org/pdf/2003.08271v4.pdf

NLP PTM

finetune

2022-06-11 c059205531eaa922c9f9b0b6c4fc9359 99+ fast 0.0 k

实体关系抽取

参考

https://zhuanlan.zhihu.com/p/77868938

NLP 信息抽取

实体关系抽取

2022-06-11 5a4b09b0454c02615322f44b6581c606 99+ 2 m 0.2 k

text Span抽取

基于问题在段落中寻找答案

1
2
3

1 问题：苏轼是哪里人？
2 描述：苏轼是北宋著名的文学家与政治家，眉州眉山人。
3 标签：眉州眉山人

bert中的SQuAD问答任务

结构

损失

sequence_output = all_encoder_outputs[-1] #[src_len, batch_size, hidden_size]
logits = self.qa_outputs(sequence_output)  # [src_len, batch_size,2] 
start_logits, end_logits = logits.split(1, dim=-1)
start_logits = start_logits.squeeze(-1).transpose(0, 1)  # [batch_size,src_len]
end_logits = end_logits.squeeze(-1).transpose(0, 1)  # [batch_size,src_len]
loss_fct = nn.CrossEntropyLoss(ignore_index=ignored_index)
start_loss = loss_fct(start_logits, start_positions)
end_loss = loss_fct(end_logits, end_positions)
final_loss=(start_loss + end_loss) / 2

模型输出为： [src_len, batch_size,2]

两个（start 和 end ）src_len分类的平均

预测

假设候选文本长度为n，输出n个2分类结果，选出最大的start概率和end概率最为start和end label

参考

https://zhuanlan.zhihu.com/p/77868938

https://blog.csdn.net/guangyacyb/article/details/105526482

https://zhuanlan.zhihu.com/p/473157694

NLP 信息抽取

text Span抽取

2022-06-09 c03294763d373dc62c2c648c881738f6 99+ a minute 0.1 k

序列标注

序列标注（Sequence Tagging）是NLP中最基础的任务，应用十分广泛，如分词、词性标注（POS tagging）、命名实体识别（Named Entity Recognition，NER）、关键词抽取、语义角色标注（Semantic Role Labeling）、槽位抽取（Slot Filling）等实质上都属于序列标注的范畴。

标注方式

https://zhuanlan.zhihu.com/p/147537898#

参考

https://zhuanlan.zhihu.com/p/268579769

https://zhuanlan.zhihu.com/p/147537898#

NLP NLP

序列标注

NLP评价指标

1.ppl 困惑度

bert_serving

参考

细粒度NLP任务

参考

bertviz:attention可视化工具

参考

prompt trick

目的

和finetune差异

应用场景

参考

ChineseBERT Chinese Pretraining Enhanced by Glyph and Pinyin Information

1 模型结构

2 预训练任务

3 使用

参考

finetune

1 使用哪些层参与下游任务

2 参数是否固定

3 Fine-Tuning Strategies

参考

实体关系抽取

参考

text Span抽取

标签

结构

损失

预测

参考

序列标注

标注方式

参考

Recents

Categories

Archives

Tags

Subscribe for updates