youtubednn

原文: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45530.pdf

几篇优秀博客:

https://zhuanlan.zhihu.com/p/52169807

https://zhuanlan.zhihu.com/p/52504407

https://zhuanlan.zhihu.com/p/61827629

https://zhuanlan.zhihu.com/p/46247835

下文为本人总结。

2.SYSTEM OVERVIEW

3.CANDIDATE GENERATION

3.1 Recommendation as Classification

把推荐问题转换成多分类问题

where $u \in \mathbb{R}^{N}$ represents a high-dimensional embedding”of the user, context pair and the $ v_j \in \mathbb{R}^{N}$ represent embeddings of each candidate video.

train

to efficiently train such a model with millions of classes

1.hierarchical softmax,效果不佳

2.采用candidate sampling,correct for this sampling via importance weighting

At serving time

3.2 CANDIDATE GENERATION

3.3 Heterogeneous Signals

3.4 Label and Context Selection

3.5 Experiments with Features and Depth

4.RANKING

特征工程

https://zhuanlan.zhihu.com/p/111296130

1.特征预处理

0.是否去重

1.缺失值

均值补全

2.异常值

检测异常值

数值范围

sigma准则

knn

箱线图

处理异常值

剔除

均值补全

2.特征表示

特征分类:数值特征,文本特征,类别特征

1.数值特征

1.直接使用数值

2.离散化

分桶

2.类别特征

1.one hot

2.embedding

3.其他

catboost

3.特征选择

https://blog.csdn.net/Datawhale/article/details/120582526

大致分为3种,filter,wrapper,embedded

SimCSE Simple Contrastive Learning of Sentence Embeddings

https://arxiv.org/pdf/2104.08821.pdf

1.背景

1 target

对于$D=\{(x_i,x_i^{+})\}_{i=1}^{m}$,where $x_i$ and $x_i^{+}$ are semantically related. xi,xj+ are not semantically related

x->h

Contrastive learning aims to learn effective representation by pulling semantically close neighbors together and pushing apart non-neighbors

N is mini-batch size,分子是正样本,分母为负样本(有一个正样本,感觉是可以忽略)

分母会包含分子的项吗?从代码看,会的

loss

https://www.jianshu.com/p/d73e499ec859

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def loss(self,y_pred,y_true,lamda=0.05):

'''

exist a query q1 and ranked condidat list [d1,d2,d3,...,dn]
loss= -log( exp^sim(q1,d1)/t / sum(exp^sim(q1,di)/t) i=2,...,n)

[q1,q2] [[d11,d12,d13],[d21,d22,d23]]
similarities=[[sim(q1d11),sim(q1d12),sim(q1d13)],[sim(q2d21),sim(q2d22),sim(q2d23)] ] y_true=[y1 ,y2 ]

loss = F.cross_entropy(similarities, y_true)
ref : https://www.jianshu.com/p/d73e499ec859
'''

# idxs = torch.arange(0, y_pred.shape[0])
# y_true = idxs + 1 - idxs % 2 * 2
y_pred = y_pred.reshape(-1, y_true.shape[1])

# y_true=[0]*y_pred.sha pe[0]
# similarities = F.cosine_similarity(y_pred.unsqueeze(1), y_pred.unsqueeze(0), dim=2)
# similarities = similarities - torch.eye(y_pred.shape[0]) * 1e12
y_pred = y_pred / lamda
y_true = torch.argmax(y_true, dim=1)
loss = F.cross_entropy(y_pred, y_true)
return loss

2 representations评价指标

Alignment: calculates expected distance between embeddings of the paired instances(paired instances就是正例)

uniformity: measures how well the embeddings are uniformly distributed

2.结构

2.1 Unsupervised

$x_i->h_i^{z_i},x_i->h_i^{z_i^{‘}}$

z is a random mask for dropout,loss为

2.2 Supervised

引入非目标任务的有标签数据集,比如NLI任务,$(x_i,x_i^{+},x_i^{-})$,where $x_i$ is the premise, $x_i^{+}$and $x_i^{-}$are entailment and contradiction hypotheses.

$(h_i,h_j^{+})$为normal negatives,$(h_i,h_j^{-})$为hard negatives


:D 一言句子获取中...