Factorization Machines

原文地址 https://cseweb.ucsd.edu/classes/fa17/cse291-b/reading/Rendle2010FM.pdf

https://zhuanlan.zhihu.com/p/50426292

a new model class that combines the advantages of Support Vector Machines (SVM) with factorization models

I. INTRODUCTION

In total, the advantages of our proposed FM are:
1) FMs allow parameter estimation under very sparse data where SVMs fail.
2) FMs have linear complexity, can be optimized in the primal and do not rely on support vectors like SVMs.
3) FMs are a general predictor that can work with any real valued feature vector.

II. PREDICTION UNDER SPARSITY

III. FACTORIZATION MACHINES (FM)

A. Factorization Machine Model

1) 模型:

$x_i$表示第$i$个特征,但是针对上式,一个很大的问题,用户交互矩阵往往是比较稀疏的,这样就会导致对$w_{i,j}$的估算存在很大的问题。举个例子,假如想要估计Alice(A)和Star Trek(ST)的交互参数$w_{A,ST}$,由于训练集中没有实例同时满足$x_A$和$x_{ST}$非零,这会造成$w_{A,ST}=0$。因此这里使用了矩阵分解的思想:

2) 提升效率:

直接计算上面的公式求解$\hat{y}(x)$的时间复杂度为$O ( k n^2 ) $,因为所有的特征交叉都需要计算。但是可以通过公式变换,将时间复杂度减少到$O(kn)$,如下公式推导

B. Factorization Machines as Predictors

FM can be applied to a variety of prediction tasks. Among them are: Regression,Binary classification,Ranking

C. Learning Factorization Machines

the model parameters of FMs can be learned efficiently by gradient descent methods – e.g. stochastic gradient descent (SGD).The gradient of the FM model is:

D. d-way Factorization Machine

The 2-way FM described so far can easily be generalized to a d-way FM:

直接计算上式的时间复杂度为$O(k_dn^d)$,利用类似上面的公式变形也可以将其降低为$O(k_d n )$

E. Summary

FMs model all possible interactions between values in the feature vector $x$ using factorized interactions instead of full parametrized ones. This has two main advantages:

1) The interactions between values can be estimated even under high sparsity. Especially, it is possible to generalize to unobserved interactions.
2) The number of parameters as well as the time for prediction and learning is linear.

IV. FMS VS. SVMS

V. FMS VS. OTHER FACTORIZATION MODELS

参考

https://blog.csdn.net/qq_26822029/article/details/103993243


:D 一言句子获取中...