Felix Flexible Text Editing Through Tagging and Insertion
google继lasertagger之后的又一篇text edit paper
In contrast to conventional sequence-to-sequence (seq2seq) models, FELIX is efficient in low-resource settings and fast at inference time, while being capable of modeling flexible input-output transformations. We achieve this by decomposing the text-editing task into two sub-tasks: tagging to decide on the subset of input tokens and their order in the output text and insertion to in-fill the missing tokens in the output not present in the input.
1 Introduction
In particular, we have designed FELIX with the following requirements in mind: Sample efficiency, Fast inference time, Flexible text editing
2 Model description
FELIX decomposes the conditional probability of generating an output sequence y from an input
x as follows:
2.1 Tagging Model
trained to optimize both the tagging and pointing loss:
L=Lpointing+λLtaggingTagging :
tag sequence yt由3种tag组成:KEEP,DELETE,INSERT(INS)
Tags are predicted by applying a single feedforward layer f to the output of the encoder hL (the source sentence is first encoded using a 12-layer BERT-base model). yti=argmax(f(hLi))
Pointing:
Given a sequence x and the predicted tags yt , the re-ordering model generates a permutation π so that from πand yt we can reconstruct the insertion model input ym. Thus we have:
p(ym|x)≈∏ip(π(i)|x,yt,i)p(yti|x)Our implementation is based on a pointer network. The output of this model is a series of predicted pointers (source token → next target token)
The input to the Pointer layer at position i:
hL+1i=f([hLi;e(yti);e(pi)])其中e(yti)is the embedding of the predicted tag,e(pi) is the positional embedding
The pointer network attends over all hidden states, as such:
p(π(i)|hL+1i)=attention(hL+1i,hL+1π(i))其中hL+1i as Q, hL+1π(i) as K
When realizing the pointers, we use a constrained beam search
2.2 Insertion Model
To represent masked token spans we consider two options: masking and infilling. In the former case the tagging model predicts how many tokens need to be inserted by specializing the INSERT tag into INSk, where k translates the span into k MASK tokens. For the infilling case the tagging model predicts a generic INS tag.
Note that we preserve the deleted span in the input to the insertion model by enclosing it between [REPL] and [/REPL] tags.
our insertion model is also based on a 12-layer BERT-base and we can directly take advantage of the BERT-style pretrained checkpoints.
参考
Felix Flexible Text Editing Through Tagging and Insertion
1.Evaluation of Text Generation A Survey
2.text edit
3.LASERTAGGER
4.文本生成评价指标
5.attention seq2seq
未找到相关的 Issues 进行评论
请联系 @hlw95 初始化创建