2022-07-01 aea62ac77f4f0bea146a7f432a356389 99+ fast 0.0 k

bert_serving

bert词向量服务，生成词向量并聚类可视化

参考

https://www.jianshu.com/p/61323d366f7c

2022-06-30 71399647295cde385c2492cf139c727c 99+ fast 0.1 k

bertviz:attention可视化工具

看不同layer，不同head的attention

注意：

from bertviz.neuron_view import show
from bertviz.transformers_neuron_view import BertModel, BertTokenizer
model1=BertModel.from_pretrained(path)
model_type = 'bert'

show(model1, model_type, tokenizer, sentence_a, sentence_b, layer=4, head=3)
可以
###########################

from bertviz.neuron_view import show
from transformers import BertTokenizer, BertModel
model1=BertModel.from_pretrained(path)
model_type = 'bert'

show(model1, model_type, tokenizer, sentence_a, sentence_b, layer=4, head=3)
报错

参考

https://zhuanlan.zhihu.com/p/457043243

bertviz:attention可视化工具

2022-01-24 f612a3cb8926d7a20f7a2570f145f1de 99+ fast 0.0 k

pretrain

https://huggingface.co/docs/transformers/task_summary Language Modeling

https://huggingface.co/blog/how-to-train

2022-01-17 ea4ea182972bcbaca785acaaee9aa0a3 99+ fast 0.1 k

Prompt-learning小帮手-openprompt

清华NLP实验室推出OpenPrompt开源工具包

1 结构

2 教程

可以参考官方https://hub.fastgit.xyz/thunlp/OpenPrompt

有详细的步骤和case

参考

https://hub.fastgit.xyz/thunlp/OpenPrompt

https://zhuanlan.zhihu.com/p/420335724

https://github.com/thunlp/OpenPrompt

Prompt-learning小帮手-openprompt

2021-12-09 d811ccb886d4564a5d237abcea635e61 99+ fast 0.0 k

AutoTokenizer和BertTokenizer区别

https://github.com/huggingface/transformers/issues/5587

2021-12-07 9432b49a79086904a621b6c0f4f85c94 99+ 2 m 0.3 k

huggingface

NLP小帮手，huggingface的transformer

git： https://github.com/huggingface/transformers

paper： https://arxiv.org/abs/1910.03771v5

整体结构

简单教程：

https://blog.csdn.net/weixin_44614687/article/details/106800244

from_pretrained

底层为load_state_dict

Some weights of the model checkpoint at ../../../../test/data/chinese-roberta-wwm-ext were not used when initializing listnet_bert: ['cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing listnet_bert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing listnet_bert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of listnet_bert were not initialized from the model checkpoint at ../../../../test/data/chinese-roberta-wwm-ext and are newly initialized: ['Linear2.weight', 'Linear1.weight', 'Linear1.bias', 'Linear2.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


两部分：1 加载的预训练模型中有参数没有用到  2 自己的模型有参数没有初始化
finetune的时候报这个 很正常
predict的时候应该不会有

关于model

BertModel -> our model

1 加载transformers中的模型

1	from transformers import BertPreTrainedModel, BertModel,AutoTokenizer,AutoConfig

2 基于1中的模型搭建自己的结构

2021-11-30 a8dd0c79b98cafd5914c0f5bcfcc136d 99+ fast 0.0 k

中文文本分类工具

感谢大佬开源

https://github.com/649453932/Chinese-Text-Classification-Pytorch

中文文本分类

2021-11-16 c8858890dcfb0794ab7acd47d2084d1c 99+ fast 0.0 k

Tensorflow中的Seq2Seq全家桶

https://zhuanlan.zhihu.com/p/47929039

Tensorflow中的Seq2Seq全家桶

2021-07-27 1898b4493fdd44773c33898a3f1bb63e 99+ 2 m 0.3 k

nlpcda-NLP中文数据增强工具，强推

下载：pip install nlpcda

工具支持

中文数据增强