2022-05-31 33a7636093b40a75f321dd9a2f266054 99+ fast 0.1 k0 visits

超长文本处理

bert最大长度固定，默认为512

数据层面：

1 直接截断：太粗暴，可能把重要的丢了

2 抽取重要部分

3 分段+拼接

问题很多，怎么训练？？怎么预测？？？

模型层面：

transformer-xl based的ptm，比如xlnet

传统rnn based的seq2seq

参考

https://www.zhihu.com/question/395903256

超长文本处理

http://example.com/2022/05/31/long-text/

Author

Lavine Hu

Posted on

2022-05-31

Updated on

2022-06-11

Licensed under

Comments