transformer综述
Transformer-XL
https://arxiv.org/abs/1901.02860v3
RoFormer
https://arxiv.org/pdf/2104.09864.pdf
google2020出品的transformer的综述
Transformer-XL
https://arxiv.org/abs/1901.02860v3
RoFormer
https://arxiv.org/pdf/2104.09864.pdf
google2020出品的transformer的综述
https://www.zhihu.com/question/387899184
Extreme Multi Label Classification,XML,可以提供一些启发 https://zhuanlan.zhihu.com/p/131584886
原文: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45530.pdf
几篇优秀博客:
https://zhuanlan.zhihu.com/p/52169807
https://zhuanlan.zhihu.com/p/52504407
https://zhuanlan.zhihu.com/p/61827629
https://zhuanlan.zhihu.com/p/46247835
下文为本人总结。
把推荐问题转换成多分类问题
where $u \in \mathbb{R}^{N}$ represents a high-dimensional embedding”of the user, context pair and the $ v_j \in \mathbb{R}^{N}$ represent embeddings of each candidate video.
train:
to efficiently train such a model with millions of classes
1.hierarchical softmax,效果不佳
2.采用candidate sampling,correct for this sampling via importance weighting
At serving time
https://zhuanlan.zhihu.com/p/111296130
均值补全
检测异常值
数值范围
sigma准则
knn
箱线图
处理异常值
剔除
均值补全
特征分类:数值特征,文本特征,类别特征
1.直接使用数值
2.离散化
分桶
1.one hot
2.embedding
3.其他
catboost
https://blog.csdn.net/Datawhale/article/details/120582526
大致分为3种,filter,wrapper,embedded
https://arxiv.org/pdf/2104.08821.pdf
1 target
对于$D=\{(x_i,x_i^{+})\}_{i=1}^{m}$,where $x_i$ and $x_i^{+}$ are semantically related. xi,xj+ are not semantically related
x->h
Contrastive learning aims to learn effective representation by pulling semantically close neighbors together and pushing apart non-neighbors
N is mini-batch size,分子是正样本,分母为负样本(有一个正样本,感觉是可以忽略)
分母会包含分子的项吗?从代码看,会的
loss
https://www.jianshu.com/p/d73e499ec859
1 | def loss(self,y_pred,y_true,lamda=0.05): |
2 representations评价指标
Alignment: calculates expected distance between embeddings of the paired instances(paired instances就是正例)
uniformity: measures how well the embeddings are uniformly distributed
$x_i->h_i^{z_i},x_i->h_i^{z_i^{‘}}$
z is a random mask for dropout,loss为
引入非目标任务的有标签数据集,比如NLI任务,$(x_i,x_i^{+},x_i^{-})$,where $x_i$ is the premise, $x_i^{+}$and $x_i^{-}$are entailment and contradiction hypotheses.
$(h_i,h_j^{+})$为normal negatives,$(h_i,h_j^{-})$为hard negatives