2021-10-13 a2c91f197a5df6903fe36145084721e6 99+ 2 m 0.3 k

Deep Interest Network for Click-Through Rate Prediction

1.DEEP INTEREST NETWORK

1.1 特征表示

特征可以表示为$\textbf{x}=[t_1^T,t_2^T,…,t_M^T]^T$，one hot表示，举个例子如下

1.2 embedding层

对于$t_i \in \mathbb{R}^{K_i}$，$W^i=[w_1^i,…,w_j^i,…,w_{K_i}^i] \in \mathbb{R}^{D\times K_i} $

1.3 Pooling layer and Concat layer

$\textbf{e}_i=pooling(\textbf{e}_{i_1},\textbf{e}_{i_2},...,\textbf{e}_{i_k})$

Two most commonly used pooling layers are sum pooling and average pooling, which apply element-wise sum/average operations to the list of embedding vectors.

1.4 Activation unit

DIN就是在base的基础上加入local activation unit，作用是对用户行为特征的不同商品给与不同权重，其余保持不变，式子表示如下

$\mathcal{V}_{U}(A)=f(\mathcal{V}_{A},\textbf{e}_1,\textbf{e}_2,...,\textbf{e}_H)=\sum_{j=1}^Ha(\textbf{e}_j,\mathcal{V}_{A})\textbf{e}_j=\sum_{j=1}^H\textbf{w}_j\textbf{e}_j$

其中$a(\cdot)$为上图中activate unit,与attention很像，原文是Local activation unit of Eq.(3) shares similar ideas with attention methods which are developed in NMT task[1].

1.5 MLP

1.6 Loss

交叉熵表示为：

$L=-\frac{1}{N}\sum_{(\textbf{x},y) \in \textbf{S}}(ylogp(\textbf{x})+(1-y)log(1-p(\textbf{x})))$

2.训练技巧

Practically, training industrial deep networks with large scale sparse input features is of great challenge. 引入Mini-batch Aware Regularization和Data Adaptive Activation Function，具体不在此介绍

参考

原文 https://arxiv.org/pdf/1706.06978.pdf

广告系统

DIN

Deep Interest Network for Click-Through Rate Prediction

1.DEEP INTEREST NETWORK

1.1 特征表示

1.2 embedding层

1.3 Pooling layer and Concat layer

1.4 Activation unit

1.5 MLP

1.6 Loss

2.训练技巧

参考

Recents

Categories

Archives

Tags

Subscribe for updates