中文词粒度BERT

1 Is Word Segmentation Necessary for Deep Learning of Chinese Representations?

we find that charbased(字粒度) models consistently outperform wordbased (词粒度)models.

We show that it is because word-based models are more vulnerable to data sparsity and the presence of out-of-vocabulary (OOV) words, and thus more prone to overfitting.

2 腾讯中文词模型

词模型在公开数据集的表现逊于字模型

参考

https://arxiv.org/pdf/1905.05526.pdf

https://www.jiqizhixin.com/articles/2019-06-27-17

Author

Lavine Hu

Posted on

2021-10-26

Updated on

2022-05-28

Licensed under

Comments

:D 一言句子获取中...