A Deep Look into Neural Ranking Models for Information Retrieval

https://par.nsf.gov/servlets/purl/10277191

3 A Unified Model Formulation

So a generalized LTR problem is to find the optimal ranking function f ∗ by minimizing the loss function over some labeled dataset

f 是ranking function,s是query,t是候选集,y is the label set where labels represent grades

Without loss of generality, the ranking function f could be further abstracted by the following unified formulation

ψ, ϕare representation functions which extract features from s and t respectively η is the interaction function which extracts features from (s, t) pair, and g is the evaluation function which computes the relevance score based on the feature representations.

4. Model Architecture

4.1. Symmetric vs. Asymmetric Architectures

Symmetric Architecture: The inputs s and t are assumed to be homogeneous, so that symmetric network structure could be applied over the inputs

Asymmetric Architecture: The inputs s and t are assumed to be heterogeneous, so that asymmetric network structures should be applied over the inputs

4.2. Representation-focused vs. Interaction-focused Architectures

Representation-focused Architecture: The underlying assumption of this type of architecture is that relevance depends on compositional meaning of the input texts. Therefore, models in this category usually define complex representation functions ϕ and ψ (i.e., deep neural networks), but no interaction function η

Interaction-focused Architecture: The underlying assumption of this type of architecture is that relevance is in essence about the relation between the input texts, so it would be more effective to directly learn from interactions rather than from individual representations. Models in this category thus define the interaction function η rather than the representation functions ϕ and ψ

Hybrid Architecture: In order to take advantage of both representation focused and interaction-focused architectures, a natural way is to adopt a hybrid architecture for feature learning. We find that there are two major hybrid strategies to integrate the two architectures, namely combined strategy and coupled strategy.

4.3. Single-granularity vs. Multi-granularity Architecture

Single-granularity Architecture: The underlying assumption of the single granularity architecture is that relevance can be evaluated based on the high level features extracted by ϕ, ψ and η from the single-form text inputs.

Multi-granularity Architecture: The underlying assumption of the multigranularity architecture is that relevance estimation requires multiple granularities of features, either from different-level feature abstraction or based on different types of language units of the inputs

5. Model Learning

5.1. Learning objective

Similar to other LTR algorithms, the learning objective of neural ranking models can be broadly categorized into three groups: pointwise, pairwise, and listwise.

5.1.1. Pointwise Ranking Objective

1 loss

The idea of pointwise ranking objectives is to simplify a ranking problem to a set of classification or regression problems

a. Cross Entropy

For example, one of the most popular pointwise loss functions used in neural ranking models is Cross Entropy:

b. Mean Squared Error

There are other pointwise loss functions such as Mean Squared Error for numerical labels, but they are more commonly used in recommendation tasks.

2 优缺点

a.advantages

First, it simple and easy to scale. Second, the outputs have real meanings and value in practice. For instance, in sponsored search, a model learned with cross entropy loss and clickthrough rates can directly predict the probability of user clicks on search ads, which is more important than creating a good result list in some application scenarios.

b.disadvantages

less effective ,Because pointwise loss functions consider no document preference or order information, they do not guarantee to produce the best ranking list when the model loss reaches the global minimum.

5.1.2. Pairwise Ranking Objective

1 loss

Pairwise ranking objectives focus on optimizing the relative preferences between documents rather than their labels.

a.Hinge loss

b.cross entropy

​ RankNet

2 优缺点

a.advantages

effective in many tasks

b.disadvantages

pairwise methods does not always lead to the improvement of final ranking metrics due to two reasons: (1) it is impossible to develop a ranking model that can correctly predict document preferences in all cases; and (2) in the computation of most existing ranking metrics, not all document pairs are equally important.

5.1.3. Listwise Ranking Objective

1 loss

listwise loss functions compute ranking loss with each query and their candidate document list together

a. ListMLE

https://blog.csdn.net/qq_36478718/article/details/122598406

b.Attention Rank function

https://arxiv.org/abs/1804.05936

c. softmax-based listwise

https://arxiv.org/pdf/1811.04415.pdf

2 优缺点

a.advantages

While listwise ranking objectives are generally more effective than pairwise ranking objectives

b.disadvantages

their high computational cost often limits their applications. They are suitable for the re-ranking phase over a small set of candidate documents

5.1.4. Multi-task Learning Objective

the optimization of neural ranking models may include the learning of multiple ranking or non-ranking objectives at the same time.

5.2. Training Strategies

1 Supervised learning

2 Weakly supervised learning

3 Semi-supervised learning

6. Model Comparison

比较了常见模型在不同应用的效果

1 Ad-hoc Retrieval

https://blog.csdn.net/qq_44092699/article/details/106335971

Ad-hoc information retrieval refers to the task of returning information resources related to a user query formulated in natural language.

2 QA

A Deep Look into Neural Ranking Models for Information Retrieval

http://example.com/2022/03/31/ranking-survey/

Author

Lavine Hu

Posted on

2022-03-31

Updated on

2024-04-11

Licensed under

# Related Post
  1.排序学习
Comments

:D 一言句子获取中...