Deep Interest Network for Click-Through Rate Prediction

1.DEEP INTEREST NETWORK

1.1 特征表示

特征可以表示为$\textbf{x}=[t_1^T,t_2^T,…,t_M^T]^T$,one hot表示,举个例子如下

1.2 embedding层

对于$t_i \in \mathbb{R}^{K_i}$,$W^i=[w_1^i,…,w_j^i,…,w_{K_i}^i] \in \mathbb{R}^{D\times K_i} $

1.3 Pooling layer and Concat layer

Two most commonly used pooling layers are sum pooling and average pooling, which apply element-wise sum/average operations to the list of embedding vectors.

1.4 Activation unit

DIN就是在base的基础上加入local activation unit,作用是对用户行为特征的不同商品给与不同权重,其余保持不变,式子表示如下

其中$a(\cdot)$为上图中activate unit,与attention很像,原文是Local activation unit of Eq.(3) shares similar ideas with attention methods which are developed in NMT task[1].

1.5 MLP

1.6 Loss

交叉熵表示为:

2.训练技巧

Practically, training industrial deep networks with large scale sparse input features is of great challenge. 引入Mini-batch Aware Regularization和Data Adaptive Activation Function,具体不在此介绍

参考

原文 https://arxiv.org/pdf/1706.06978.pdf


:D 一言句子获取中...