ALBERT A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS

There are three main contributions that ALBERT makes over the design choices of BERT:

1 Factorized embedding parameterization

原来embedding层是一个矩阵$M_{emb[V\times H]} $,现在变为两个$M_{emb1[V\times E]}$和$M_{emb2[E\times H]}$,参数量从VH变为VE+EH(This parameter reduction is significant when H >> E.)

2 Cross-layer parameter sharing

The default decision for ALBERT is to share all parameters across layers(attention,FFN))

3 Inter-sentence coherence loss

原来的NSP改为现在的sop,正例的构建和NSP是一样的,不过负例则是将两句话反过来。

参考

https://zhuanlan.zhihu.com/p/88099919

https://blog.csdn.net/weixin_37947156/article/details/101529943

https://openreview.net/pdf?id=H1eA7AEtvS

ALBERT A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS

http://example.com/2021/11/04/albert/

Author

Lavine Hu

Posted on

2021-11-04

Updated on

2022-05-28

Licensed under

Comments

:D 一言句子获取中...