2021-09-30 439b8b1d934392a5881d2cae5276da81 99+ 4 m 0.5 k0 visits

Felix Flexible Text Editing Through Tagging and Insertion

google继lasertagger之后的又一篇text edit paper

In contrast to conventional sequence-to-sequence (seq2seq) models, FELIX is efficient in low-resource settings and fast at inference time, while being capable of modeling flexible input-output transformations. We achieve this by decomposing the text-editing task into two sub-tasks: tagging to decide on the subset of input tokens and their order in the output text and insertion to in-fill the missing tokens in the output not present in the input.

1 Introduction

In particular, we have designed FELIX with the following requirements in mind: Sample efficiency, Fast inference time, Flexible text editing

2 Model description

FELIX decomposes the conditional probability of generating an output sequence $y$ from an input
$x$ as follows:

$p(\textbf{y}|\textbf{x})=p_{ins}(\textbf{y}|\textbf{y}^m)p_{tag}(\textbf{y}^t,\pi|\textbf{x})$

2.1 Tagging Model

trained to optimize both the tagging and pointing loss:

$\mathcal{L}=\mathcal{L}_{pointing }+\lambda\mathcal{L}_{tagging }$

Tagging :

tag sequence $\textbf{y}^t$由3种tag组成：$KEEP$，$DELETE$，$INSERT (INS)$

Tags are predicted by applying a single feedforward layer $f$ to the output of the encoder $\textbf{h}^L$ (the source sentence is first encoded using a 12-layer BERT-base model). $\textbf{y}^t_i=argmax(f(\textbf{h}^L_i))$

Pointing:

Given a sequence $\textbf{x}$ and the predicted tags $\textbf{y}^t$ , the re-ordering model generates a permutation $\pi$ so that from $\pi$and $\textbf{y}^t$ we can reconstruct the insertion model input $\textbf{y}^m$. Thus we have:

$p(\textbf{y}^m|\textbf{x}) \approx \prod \limits_{i}p(\pi(i)|\textbf{x},\textbf{y}^t,i)p(\textbf{y}_i^t|\textbf{x})$

Our implementation is based on a pointer network. The output of this model is a series of predicted pointers (source token → next target token)

The input to the Pointer layer at position $i$:

$\textbf{h}^{L+1}_{i}=f([\textbf{h}^{L}_{i};e(\textbf{y}_i^t);e(\textbf{p}_i)])$

其中$e(\textbf{y}_i^t)$is the embedding of the predicted tag，$e(\textbf{p}_i)$ is the positional embedding

The pointer network attends over all hidden states, as such:

$p(\pi(i)|\textbf{h}_i^{L+1})=attention(\textbf{h}_i^{L+1},\textbf{h}_{\pi(i)}^{L+1})$

其中$\textbf{h}_i^{L+1}$ as $Q $, $\textbf{h}_{\pi(i)}^{L+1}$ as $K$

When realizing the pointers, we use a constrained beam search

2.2 Insertion Model

To represent masked token spans we consider two options: masking and infilling. In the former case the tagging model predicts how many tokens need to be inserted by specializing the $INSERT$ tag into $INS_k$, where $k$ translates the span into $ k$ $MASK$ tokens. For the infilling case the tagging model predicts a generic $INS$ tag.

Note that we preserve the deleted span in the input to the insertion model by enclosing it between $[REPL]$ and $[/REPL]$ tags.

our insertion model is also based on a 12-layer BERT-base and we can directly take advantage of the BERT-style pretrained checkpoints.

参考

https://aclanthology.org/2020.findings-emnlp.111.pdf

Felix Flexible Text Editing Through Tagging and Insertion

http://example.com/2021/09/30/felix/

Author

Lavine Hu

Posted on

2021-09-30

Updated on

2021-11-26

Felix Flexible Text Editing Through Tagging and Insertion

1 Introduction

2 Model description

2.1 Tagging Model

2.2 Insertion Model

参考

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Recents

Categories

Archives

Tags

Subscribe for updates