Deep Interest Network for Click-Through Rate Prediction
1.DEEP INTEREST NETWORK
1.1 特征表示
特征可以表示为$\textbf{x}=[t_1^T,t_2^T,…,t_M^T]^T$,one hot表示,举个例子如下
1.2 embedding层
对于$t_i \in \mathbb{R}^{K_i}$,$W^i=[w_1^i,…,w_j^i,…,w_{K_i}^i] \in \mathbb{R}^{D\times K_i} $
1.3 Pooling layer and Concat layer
Two most commonly used pooling layers are sum pooling and average pooling, which apply element-wise sum/average operations to the list of embedding vectors.
1.4 Activation unit
DIN就是在base的基础上加入local activation unit,作用是对用户行为特征的不同商品给与不同权重,其余保持不变,式子表示如下
其中$a(\cdot)$为上图中activate unit,与attention很像,原文是Local activation unit of Eq.(3) shares similar ideas with attention methods which are developed in NMT task[1].
1.5 MLP
1.6 Loss
交叉熵表示为:
2.训练技巧
Practically, training industrial deep networks with large scale sparse input features is of great challenge. 引入Mini-batch Aware Regularization和Data Adaptive Activation Function,具体不在此介绍