Pre-Training with Whole Word Masking for Chinese BERT
BERT-wwm-ext
wwm:whole word mask
ext: we also use extended training data (mark with ext in the model name)
预训练
1 改变mask策略
Whole Word Masking,wwm
cws: Chinese Word Segmentation
对比四种mask策略
参考
Pre-Training with Whole Word Masking for Chinese BERT
https://arxiv.org/abs/1906.08101v3
Revisiting Pre-trained Models for Chinese Natural Language Processing
Pre-Training with Whole Word Masking for Chinese BERT