A Deep Look into Neural Ranking Models for Information Retrieval

https://par.nsf.gov/servlets/purl/10277191

3 A Unified Model Formulation

So a generalized LTR problem is to find the optimal ranking function f ∗ by minimizing the loss function over some labeled dataset

f 是ranking function,s是query,t是候选集,y is the label set where labels represent grades

Without loss of generality, the ranking function f could be further abstracted by the following unified formulation

ψ, ϕare representation functions which extract features from s and t respectively η is the interaction function which extracts features from (s, t) pair, and g is the evaluation function which computes the relevance score based on the feature representations.

4. Model Architecture

4.1. Symmetric vs. Asymmetric Architectures

Symmetric Architecture: The inputs s and t are assumed to be homogeneous, so that symmetric network structure could be applied over the inputs

Asymmetric Architecture: The inputs s and t are assumed to be heterogeneous, so that asymmetric network structures should be applied over the inputs

4.2. Representation-focused vs. Interaction-focused Architectures

Representation-focused Architecture: The underlying assumption of this type of architecture is that relevance depends on compositional meaning of the input texts. Therefore, models in this category usually define complex representation functions ϕ and ψ (i.e., deep neural networks), but no interaction function η

Interaction-focused Architecture: The underlying assumption of this type of architecture is that relevance is in essence about the relation between the input texts, so it would be more effective to directly learn from interactions rather than from individual representations. Models in this category thus define the interaction function η rather than the representation functions ϕ and ψ

Hybrid Architecture: In order to take advantage of both representation focused and interaction-focused architectures, a natural way is to adopt a hybrid architecture for feature learning. We find that there are two major hybrid strategies to integrate the two architectures, namely combined strategy and coupled strategy.

4.3. Single-granularity vs. Multi-granularity Architecture

Single-granularity Architecture: The underlying assumption of the single granularity architecture is that relevance can be evaluated based on the high level features extracted by ϕ, ψ and η from the single-form text inputs.

Multi-granularity Architecture: The underlying assumption of the multigranularity architecture is that relevance estimation requires multiple granularities of features, either from different-level feature abstraction or based on different types of language units of the inputs

5. Model Learning

5.1. Learning objective

Similar to other LTR algorithms, the learning objective of neural ranking models can be broadly categorized into three groups: pointwise, pairwise, and listwise.

5.1.1. Pointwise Ranking Objective

1 loss

The idea of pointwise ranking objectives is to simplify a ranking problem to a set of classification or regression problems

a. Cross Entropy

For example, one of the most popular pointwise loss functions used in neural ranking models is Cross Entropy:

b. Mean Squared Error

There are other pointwise loss functions such as Mean Squared Error for numerical labels, but they are more commonly used in recommendation tasks.

2 优缺点

a.advantages

First, it simple and easy to scale. Second, the outputs have real meanings and value in practice. For instance, in sponsored search, a model learned with cross entropy loss and clickthrough rates can directly predict the probability of user clicks on search ads, which is more important than creating a good result list in some application scenarios.

b.disadvantages

less effective ,Because pointwise loss functions consider no document preference or order information, they do not guarantee to produce the best ranking list when the model loss reaches the global minimum.

5.1.2. Pairwise Ranking Objective

1 loss

Pairwise ranking objectives focus on optimizing the relative preferences between documents rather than their labels.

a.Hinge loss

b.cross entropy

​ RankNet

2 优缺点

a.advantages

effective in many tasks

b.disadvantages

pairwise methods does not always lead to the improvement of final ranking metrics due to two reasons: (1) it is impossible to develop a ranking model that can correctly predict document preferences in all cases; and (2) in the computation of most existing ranking metrics, not all document pairs are equally important.

5.1.3. Listwise Ranking Objective

1 loss

listwise loss functions compute ranking loss with each query and their candidate document list together

a. ListMLE

https://blog.csdn.net/qq_36478718/article/details/122598406

b.Attention Rank function

https://arxiv.org/abs/1804.05936

c. softmax-based listwise

https://arxiv.org/pdf/1811.04415.pdf

2 优缺点

a.advantages

While listwise ranking objectives are generally more effective than pairwise ranking objectives

b.disadvantages

their high computational cost often limits their applications. They are suitable for the re-ranking phase over a small set of candidate documents

5.1.4. Multi-task Learning Objective

the optimization of neural ranking models may include the learning of multiple ranking or non-ranking objectives at the same time.

5.2. Training Strategies

1 Supervised learning

2 Weakly supervised learning

3 Semi-supervised learning

6. Model Comparison

比较了常见模型在不同应用的效果

1 Ad-hoc Retrieval

https://blog.csdn.net/qq_44092699/article/details/106335971

Ad-hoc information retrieval refers to the task of returning information resources related to a user query formulated in natural language.

2 QA

ranknet对比listnet

The ListNet method grows on the bases of RankNet, they both employ the Cross Entropy function as a loss function and Gradient Descendant as algorithm to train a Neural Network Model. While the ListNet uses document list as instances, RankNet uses document pairs.

We investigated why the listwise method ListNet can outperform the pairwise methods of RankNet, Ranking SVM, and RankBoost.

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2007-40.pdf

1.for the pairwise approach the number of document pairs varies largely from query to query

2.The pairwise approach actually employs a ‘pairwise’ loss function, which might be too loose as an approximation of the performance measures

Learning to Rank From Pairwise Approach to Listwise Approach(listnet)

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2007-40.pdf

https://blog.csdn.net/Mr_tyting/article/details/80554849

核心思想为:

  1. Given two lists of scores(模型和人)
  2. we can first calculate two permutation probability distributions from them(简化到用top1)
  3. and then calculate the distance between the two distributions as the listwise loss function.(交叉熵)

4. Probability Models

4.1. Permutation Probability

$\pi=(2,3,1) $指的是对象2排在第一位

上面是topn的形式

因为总共有n!次排序组合

4.2. Top One Probability

topk:

总共有N ! / ( N − k ) ! 种不同排列,大大减少了计算复杂度

top1:

此时有n种不同排列情况

概率分布的含义:对于每个j,分别都处于第一的概率是多少

5.Learning Method: ListNet

We employ a new learning method for optimizing the listwise loss function based on top one probability, with Neural Network as model and Gradient Descent as optimization algorithm. We refer to the method as ListNet.

$y^{(i)}$是真实的score list,有个疑问就是$y^{(i)}$怎么得到?关于这个,应该是先有真实的score list(人打),然后基于score list得到排序,参考 https://zhuanlan.zhihu.com/p/66514492

核心步骤

1.打标得到真实的score list,模型得到预测的score list

2.然后用softmax得到真实的和预测的score list的概率分布

3.然后用交叉熵计算两种概率分布的差距

pairwise

pairwise learning to rank 的方法可以分为两大类。

第一类是基于打分函数,它们通过一些特殊的设计让模型依靠“样本对”的信息来学习得到每个样本的score。所以得到这类方法最后的全局排序结果很简单,就是用所有样本的score来排序即可。

另一类方法是基于优先函数的方法。这类方法的整个过程分为两个阶段,第一阶段是用机器学习模型来学习两个样本之间的优先关系,例如f(x1, x2)=1表示样本x1优先于x2(x1应该排在x2前面),f(x1, x2)=-1表示样本x2优先于x1(x1应该排在x2后面)。从题主的问题来看,可能问的是“当我们已经训练出了优先函数f之后,如何对所有样本进行排序,并且使该排序在最大程度上与f的结果一致”。这个问题在学界被称为Rank Aggregation(排列聚合)。

具体参考 https://www.zhihu.com/question/389068269

别的相关参考:

https://www.jianshu.com/p/235756fbf6b6

https://zhuanlan.zhihu.com/p/318300682

https://zhuanlan.zhihu.com/p/65224450

https://zhuanlan.zhihu.com/p/65756030

https://www.zhihu.com/question/389068269/answer/1180120736

From RankNet to LambdaRank to LambdaMART

1.RankNet

RankNet采用pairwise的方法进行模型训练。

loss推导

给定特定$query$下的两个文档$U_i$和$U_j$,其特征向量分别为$x_i$和$x_j$,经过RankNet进行前向计算得到对应的分数为$s_i=f(x_i)$和$s_j=f(x_j)$。用$U_i \rhd U_j$表示$U_i$比$U_j$排序更靠前。继而可以用下面的公式来表示$U_i$应该比$U_j$排序更靠前的概率:

定义$S_{ij} \in \{0,\pm1\}$为文档$i$和文档$j$被标记的标签之间的关联,即

定义$\overline{P}_{ij}=\frac{1}{2}(1+S_{ij})$表示$U_i$应该比$U_j$排序更靠前的已知概率,则可以用交叉熵定义优化目标的损失函数:

注意:$\sigma$是超参数

ranknet 加速

2.LambdaRank

ranket缺陷为只考虑pair的相对位置没有考虑二者在列表的整体位置

LambdaRank本质为ranknet基础上加入Listwise的指标,因此有人将LambdaRank归为listwise方法,也有归到pairwise方法

2.1 RankNet的局限

2.2 LambdaRank定义

上述公式可以进一步简化,即只考虑$S_{ij}=1$ (为什么可以?????)

那么Lambda,$\lambda$,就是梯度

为了加强排序中前后顺序的重要性,Lambda在原基础上引入评价指标Z(如NDCG),把交换两个文档的位置引起的评价指标的变化$|\Delta Z_{ij}|$作为其中一个因子:

反推出 LambdaRank 的损失函数:

3.LambdaMART

属于listwise,也有说pairwise。

LambdaMART=lambda($\lambda$)+mart(gbdt)

$\lambda$就是梯度,lambdarank就是一种loss,gbdt就是模型

lambdamart说白了就是利用gbdt计算lambdarank中s,或者说将lambdarank作为gbdt的loss

gbdt,lambdamart算法流程差异在于step1

GBDT

  1. 初始化: $f_0(x) = \mathop{\arg\min}\limits_\gamma \sum\limits_{i=1}^N L(y_i, \gamma)$
  2. for m=1 to M:
    (a). 计算负梯度: $\tilde{y}_i = -\frac{\partial L(y_i,f_{m-1}(x_i))}{\partial f_{m-1}(x_i)}, \qquad i = 1,2 \cdots N$
    (b). $\left \{ R_{jm} \right\}_1^J = \mathop{\arg\min}\limits_{\left \{ R_{jm} \right\}_1^J}\sum\limits_{i=1}^N \left [\tilde{y}_i - h_m(x_i\,;\,\left \{R_{jm},b_{jm} \right\}_1^J) \right]^2$
    (c). $\gamma_{jm} = \mathop{\arg\min}\limits_\gamma \sum\limits_{x_i \in R_{jm}}L(y_i,f_{m-1}(x_i)+\gamma)$
    (d). $f_m(x) = f_{m-1}(x) + \sum\limits_{j=1}^J \gamma_{jm}I(x \in R_{jm})$
  3. 输出$f_M(x)$

LambdaMART:

参考

https://blog.csdn.net/laolu1573/article/details/87372514

https://liam.page/uploads/slides/lambdamart.pdf

https://blog.csdn.net/zpalyq110/article/details/79527653

https://zhuanlan.zhihu.com/p/86354141

https://www.cnblogs.com/genyuan/p/9788294.html

https://blog.csdn.net/huagong_adu/article/details/40710305

https://zhuanlan.zhihu.com/p/270608987

https://www.cnblogs.com/bentuwuying/p/6690836.html

paper原文: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/MSR-TR-2010-82.pdf

https://www.jianshu.com/p/a78b3f52c221

https://blog.csdn.net/zhoujialin/article/details/46697409

https://blog.csdn.net/w28971023/article/details/45849659

https://zhuanlan.zhihu.com/p/270608987


:D 一言句子获取中...