2021-11-04 218eec650c59229902b7dd7e3520a172 99+ a minute 0.2 k0 visits

ALBERT A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS

There are three main contributions that ALBERT makes over the design choices of BERT：

1 Factorized embedding parameterization

原来embedding层是一个矩阵$M_{emb[V\times H]} $,现在变为两个$M_{emb1[V\times E]}$和$M_{emb2[E\times H]}$,参数量从VH变为VE+EH（This parameter reduction is significant when H >> E.）

2 Cross-layer parameter sharing

The default decision for ALBERT is to share all parameters across layers（attention，FFN)）

3 Inter-sentence coherence loss

原来的NSP改为现在的sop，正例的构建和NSP是一样的，不过负例则是将两句话反过来。

参考

https://zhuanlan.zhihu.com/p/88099919

https://blog.csdn.net/weixin_37947156/article/details/101529943

https://openreview.net/pdf?id=H1eA7AEtvS

ALBERT A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS

http://example.com/2021/11/04/albert/

Author

Lavine Hu

Posted on

2021-11-04

Updated on

2022-05-28

Licensed under

Alipay

Wechat

Comments

ALBERT A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS

参考

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Recents

Categories

Archives

Tags

Subscribe for updates