2021-09-30 fd304f463f2eec2951b101ea45172581 99+ 5 m 0.8 k

Factorization Machines

原文地址 https://cseweb.ucsd.edu/classes/fa17/cse291-b/reading/Rendle2010FM.pdf

https://zhuanlan.zhihu.com/p/50426292

a new model class that combines the advantages of Support Vector Machines (SVM) with factorization models

I. INTRODUCTION

In total, the advantages of our proposed FM are:
1) FMs allow parameter estimation under very sparse data where SVMs fail.
2) FMs have linear complexity, can be optimized in the primal and do not rely on support vectors like SVMs.
3) FMs are a general predictor that can work with any real valued feature vector.

II. PREDICTION UNDER SPARSITY

III. FACTORIZATION MACHINES (FM)

A. Factorization Machine Model

1) 模型:

$\hat{y}(x):=w_0+\sum_{i=1}^nw_ix_i+\sum_{i=1}^n\sum_{j=i+1}^n \bbox[border: 2px solid red]{w_{i,j}}x_ix_j$

$x_i$表示第$i$个特征，但是针对上式，一个很大的问题，用户交互矩阵往往是比较稀疏的，这样就会导致对$w_{i,j}$的估算存在很大的问题。举个例子，假如想要估计Alice(A)和Star Trek(ST)的交互参数$w_{A,ST}$，由于训练集中没有实例同时满足$x_A$和$x_{ST}$非零，这会造成$w_{A,ST}=0$。因此这里使用了矩阵分解的思想：

$\textbf{W} = \textbf{V}\textbf{V}^T,\textbf{V}=\begin{pmatrix} \textbf{v}_1 \\ \textbf{v}_2\\...\\\textbf{v}_n \end{pmatrix} \in {R}^{n\times k} \\\bbox[border: 2px solid red]{w_{i,j}=<\textbf{v}_i,\textbf{v}_j>=\sum_{f=1}^k v_{i,f}\cdot v_{j,f}} \\\hat{y}(x):=w_0+\sum_{i=1}^nw_ix_i+\sum_{i=1}^n\sum_{j=i+1}^n \bbox[border: 2px solid red]{<\textbf{v}_i,\textbf{v}_j>}x_ix_j$

2) 提升效率:

直接计算上面的公式求解$\hat{y}(x)$的时间复杂度为$O ( k n^2 ) $，因为所有的特征交叉都需要计算。但是可以通过公式变换，将时间复杂度减少到$O(kn)$，如下公式推导

$\begin{align*} \\\sum_{i=1}^n\sum_{j=i+1}^n <\textbf{v}_i,\textbf{v}_j>x_ix_j&=\frac{1}{2}\sum_{i=1}^n\sum_{j=1}^n <\textbf{v}_i,\textbf{v}_j>x_ix_j-\frac{1}{2}\sum_{i=1}^n\ <\textbf{v}_i,\textbf{v}_i>x_ix_i \\&=... \\&=\frac{1}{2}\sum_{f=1}^k((\sum_{i=1}^nv_{i,f}x_i)^2-\sum_{i=1}^nv_{i,f}^2x_i^2) \end{align*}$

B. Factorization Machines as Predictors

FM can be applied to a variety of prediction tasks. Among them are: Regression，Binary classification，Ranking

C. Learning Factorization Machines

the model parameters of FMs can be learned efficiently by gradient descent methods – e.g. stochastic gradient descent (SGD).The gradient of the FM model is:

$\\\begin{equation} \frac{\partial \hat{y}(x)}{\partial \theta}=\left\{ \begin{array}{rcl} 1& & {if \ \theta \ is \ w_0 }\\ x_i & & {if \ \theta \ is \ w_i}\\ x_i\sum_{j=1}^nv_{j,f}x_j-v_{i,f}x_i^2 & & {if \ \theta \ is \ v_{i,f}} \end{array} \right. \end{equation}$

D. d-way Factorization Machine

The 2-way FM described so far can easily be generalized to a d-way FM:

$\hat{y}(x):=w_0+\sum_{i=1}^nw_ix_i+\sum_{l=2}^d\sum_{i_l=1}^n ...\sum_{i_l=i_{l-1}+1}^n (\prod \limits_{j=1}^lx_{i_j})(\sum_{f=1}^{k_l}\prod \limits_{j=1}^lv_{i_j,f}^{(l)})$

直接计算上式的时间复杂度为$O(k_dn^d)$，利用类似上面的公式变形也可以将其降低为$O(k_d n )$

E. Summary

FMs model all possible interactions between values in the feature vector $x$ using factorized interactions instead of full parametrized ones. This has two main advantages:

1) The interactions between values can be estimated even under high sparsity. Especially, it is possible to generalize to unobserved interactions.
2) The number of parameters as well as the time for prediction and learning is linear.

IV. FMS VS. SVMS

V. FMS VS. OTHER FACTORIZATION MODELS

参考

https://blog.csdn.net/qq_26822029/article/details/103993243

机器学习模型结构

Factorization Machines

I. INTRODUCTION

II. PREDICTION UNDER SPARSITY

III. FACTORIZATION MACHINES (FM)

A. Factorization Machine Model

B. Factorization Machines as Predictors

C. Learning Factorization Machines

D. d-way Factorization Machine

E. Summary

IV. FMS VS. SVMS

V. FMS VS. OTHER FACTORIZATION MODELS

参考

Recents

Categories

Archives

Tags

Subscribe for updates