2021-09-17 497ee909ce15293540c44e2680c35f52 99+ a minute 0.1 k

HIERARCHICAL TRANSFORMERS FOR LONG DOCUMENT CLASSIFICATION

原版BERT的最大输入为512，为了使得BERT能解决超长文本的问题，作者在finetune阶段提出了两种策略来弥补这个问题，即利用BERT+LSTM或者BERT+transformer。

核心步骤：

1.split the input sequence into segments of a fixed size with overlap.

2.For each of these segments, we obtain H or P from BERT model.

3.We then stack these segment-level representations into a sequence, which serves as input to a small (100-dimensional) LSTM layer.//replacing the LSTM recurrent layer in favor of a small Transformer model

4.Finally, we use two fully connected layers with ReLU (30-dimensional) and softmax (the same dimensionality as the number of classes) activations to obtain the final predictions.

NLP 文本分类

超长文本

HIERARCHICAL TRANSFORMERS FOR LONG DOCUMENT CLASSIFICATION

Recents

Categories

Archives

Tags

Subscribe for updates