Embedding based Product Retrieval in Taobao Search
https://arxiv.org/pdf/2106.09297.pdf
http://xtf615.com/2021/10/07/taobao-ebr/
1.INTRODUCTION
框架是搜索系统主流的结构,即匹配/检索,粗排,精排,重排。
2.RELATED WORK
2.1 Deep Matching in Search
fall into two categories: representation-based learning and interaction-based learning.
Other than semantic and relevance matching, more complex factors/trade-offs, e.g., user personalization [2, 3, 10] and retrieval efficiency [5], need to be considered when applying deep models to a large-scale online retrieval system.
2.2 Deep Retrieval in Industry Search
Representation-based models with an ANN (approximate near neighbor) algorithm have become the mainstream trend to efficiently deploy neural retrieval models in industry.
3 MODEL
整体结构入下:
3.1 Problem Formulation
U={u1,…,uu,…uN}U={u1,…,uu,…uN}表示N个用户,Q={q1,…,qu,…qN}表示与用户对应的N个query,I={i1,…,iu,…iM}表示M个商品。将用户u的历史行为根据时间分成3个部分:1.real-time,before
the current time step,Ru={iu1,…,iut,…iuT} 2.short-term, before R and within ten days,Su={iu1,…,iut,…iuT} 3.long-term sequences,before S and within one month,Lu={iu1,…,iut,…iuT} ,T为时间长度。任务可以定义为:
其中F(⋅),ϕ(⋅),φ(⋅)分别表示scoring function, query and behaviors encoder, and item encoder
3.2 User Tower
3.2.1 Multi-Granular Semantic Unit
挖掘query的语义,原始输入包含当前query和历史query
没有说明为什么这么设计,感觉就是工程试验的结论。有个疑问,直接用BERT等深度语言模型来挖掘query的语义不好吗?
query表示为qu={wu1,…,wun},例如{红色,连衣裙},wu={cu1,…,cum},例如{红,色},history query表示为qhis={qu1,…,quk},例如{绿色,半身裙,黄色,长裙},其中wn∈R1×d,cm∈R1×d,qk∈R1×d
q1_gram=mean_pooling(c1,...,cm)q2_gram=mean_pooling(c1c2,...,cm−1cm)qseq=mean_pooling(w1,...,wn)qseq_seq=mean_pooling(Trm(w1,...,wn))qhis_seq=softmax(qseg⋅(qhis)T)qhisqmix=q1_gram+q2_gram+qseq+qseq_seq+qhis_seqQmgs=concat(q1_gram,q2_gram,qseq,qseq_seq,qhis_seq,qmix)其中𝑇𝑟𝑚,𝑚𝑒𝑎𝑛_𝑝𝑜𝑜𝑙𝑖𝑛𝑔, and 𝑐𝑜𝑛𝑐𝑎𝑡 denote the Transformer ,average, and vertical concatenation operation, respectively
3.2.2 User Behaviors Attention
efi=Wf⋅xfi∈R1×dfiut=concat({efi | f∈F})其中Wf是embedding matrix,xfi是one-hot vector, F是side information (e.g., leaf category, first-level category, brand and,shop)
real-time sequences
User’s click_item
Rulstm=LSTM(Ru)={hu1,...,hut,...,huT}Ruself_att=multihead_selfattention(Rulstm)={hu1,...,hut,...,huT}Ruzero_att={0,hu1,...,hut,...,huT} #add a zero vector at the first position of Ruself_attHreal=softmax(Qmgs⋅RTzero_att)⋅RTzero_attshort-term sequences
User’s click_item
Suself_att=multihead_selfattention(Su)={hu1,...,hut,...,huT}Suzero_att={0,hu1,...,hut,...,huT}Hshort=softmax(Qmgs⋅STzero_att)⋅STzero_attlong-term sequence
Lu由四个部分构成,分别为Luitem,Lushop,Luleaf,Lubrand,每个部分包含3个动作,分别为click,buy,collect。
Lclick_item,Lbuy_item,Lcollect_item→LTitemHa_item=softmax(Qmgs⋅LTitem)⋅LTitemHlong=Ha_item+Ha_shop+Ha_leaf+Ha_brand3.2.3 Fusion of Semantics and Personalization
Hqu=Self_Attfirst([[cls],Qmgs,Hreal,Hshort,Hlong])∈R1×d3.3 Item Tower
For the item tower, we experimentally use item ID and title to obtain the item representation 𝐻𝑖𝑡𝑒𝑚.Given the representation of item 𝑖’s ID, ei∈R1×d , and its title segmentation result Ti={wi1,…,wiN}
Hitem=e+tanh(Wt⋅∑Ni=1wiN)where Wt is the transformation matrix. We empirically find that applying LSTM [12] or Transformer [27] to capture the context of the title is not as effective as simple mean-pooling since the title is stacked by keywords and lacks grammatical structure.
3.4 Loss Function
adapt the softmax cross-entropy loss as the training objective
ˆy(i+|qu)=exp(F(qu,i+))∑i′∈Iexp(F(qu,i′))L(∇)=−∑i∈Iyilog(^yi)where F,I,i+,qudenote the inner product, the full item pool, the item tower’s representation Hitem, and the user tower’s representation Hqu, respectively.
3.4.1 Smoothing Noisy Training Data
the softmax function with the temperature parameter τ is defined as follows
ˆy(i+|qu)=exp(F(qu,i+)/τ)∑i′∈Iexp(F(qu,i′)/τ)If 𝜏->0, the fitted distribution is close to one hot distribution,If 𝜏->∞, the fitted distribution is close to a uniform distribution
3.4.2 Generating Relevance-improving Hard Negative Samples
We first select the negative items of i− that have the top-𝑁 inner product scores with qu to form the hard sample set Ihard
Imix=αi++(1−α)Ihard其中α∈RN×1is sampled from the uniform distribution 𝑈 (𝑎, 𝑏) (0 ≤ 𝑎 < 𝑏 ≤ 1).
ˆy(i+|qu)=exp(F(qu,i+)/τ)∑i′∈(I∪Imix)exp(F(qu,i′)/τ)Embedding based Product Retrieval in Taobao Search
来做第一个留言的人吧!