MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

简介

神经嵌入模型已成为现代信息检索(IR)流程的基本组成部分。这些模型为每个数据点产生一个嵌入$x \in \mathbb{R}^d$，通过高度优化的最大内积搜索(MIPS)算法实现快速检索。最近，从具有里程碑意义的ColBERT论文开始，多向量模型为IR任务产生了明显优越的性能。然而，由于多向量检索和评分的复杂性增加，使用这些模型进行IR是计算上昂贵的。在本文中，我们介绍了MUVERA(MUlti-VEctor Retrieval Algorithm)，这是一种检索机制，它将多向量相似性搜索降低到单向量相似性搜索。这使得可以使用现成的MIPS求解器进行多向量检索。MUVERA不对称地生成查询和文档的固定维度编码(FDEs)，这些向量的内积近似于多向量相似性。我们证明了FDEs提供了高质量的$\epsilon$-近似，从而提供了第一个具有理论保证的多向量相似性的单向量代理。实验上，我们发现FDEs实现了与之前最先进的启发式方法相同的召回率，同时检索的候选项数量减少了2-5倍。与之前的最先进实现相比，MUVERA在BEIR检索数据集的各种情况下始终实现了良好的端到端召回率和延迟，平均实现了10%的提高召回率和90%的降低延迟。
图表
解决问题

MUVERA aims to reduce the computational cost of using multi-vector models for information retrieval (IR) tasks, while maintaining high performance.
关键思路

MUVERA generates Fixed Dimensional Encodings (FDEs) of queries and documents, which are single vectors that approximate multi-vector similarity. This allows for the use of off-the-shelf MIPS solvers for multi-vector retrieval, reducing computational complexity.
其它亮点

MUVERA achieves the same recall as prior state-of-the-art heuristics while retrieving 2-5 times fewer candidates. It consistently achieves good end-to-end recall and latency across a diverse set of BEIR retrieval datasets, with an average of 10% improved recall and 90% lower latency. The paper provides theoretical guarantees for the quality of FDEs as a proxy for multi-vector similarity.
相关研究

Prior work in this field includes the ColBERT paper, which introduced multi-vector models for IR tasks, and other heuristics for reducing the computational cost of multi-vector retrieval.

MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

评论