2021-10-26 6fc64fa940d006e20b55dd3fa555894d 99+ fast 0.1 k0 visits

中文词粒度BERT

1 Is Word Segmentation Necessary for Deep Learning of Chinese Representations?

we find that charbased（字粒度） models consistently outperform wordbased （词粒度）models.

We show that it is because word-based models are more vulnerable to data sparsity and the presence of out-of-vocabulary (OOV) words, and thus more prone to overfitting.

2 腾讯中文词模型

词模型在公开数据集的表现逊于字模型

参考

https://arxiv.org/pdf/1905.05526.pdf

https://www.jiqizhixin.com/articles/2019-06-27-17

中文词粒度BERT

http://example.com/2021/10/26/ch-word-bert/

Author

Lavine Hu

Posted on

2021-10-26

Updated on

2022-05-28

Licensed under

Alipay

Wechat

Comments

中文词粒度BERT

参考

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Recents

Categories

Archives

Tags

Subscribe for updates