中文词粒度BERT
1 Is Word Segmentation Necessary for Deep Learning of Chinese Representations?
we find that charbased(字粒度) models consistently outperform wordbased (词粒度)models.
We show that it is because word-based models are more vulnerable to data sparsity and the presence of out-of-vocabulary (OOV) words, and thus more prone to overfitting.
2 腾讯中文词模型
词模型在公开数据集的表现逊于字模型