- 简介复杂的对话系统通常使用检索证据来促进事实回答。这种检索增强生成(RAG)系统从大规模异构数据存储中检索,这些存储通常被构建为多个索引或API,而不是单个的整体来源。对于给定的查询,需要从一个或少量可能的检索源中检索相关证据。复杂的查询甚至可能需要多步检索。例如,零售网站上的对话代理回答有关过去订单的客户问题,需要首先检索适当的客户订单,然后检索与订购产品相关的证据。大多数RAG代理通过交错推理和检索步骤来处理这种思维链(CoT)任务。然而,每个推理步骤直接增加了系统的延迟。对于大型模型(>100B参数),这种延迟成本是显著的,达到几秒钟的级别。多代理系统可以将查询分类到与检索源相关联的单个代理中,但这意味着一个(小)分类模型决定了一个大语言模型的性能。在这项工作中,我们提出了基于推理的计划生成器(REAPER)——一种基于LLM的计划生成器,用于在对话系统中生成检索计划。我们展示了相对于基于代理的系统的显著延迟增益,并且相对于基于分类的计划,我们能够轻松地扩展到新的和未见过的用例。尽管我们的方法可以应用于任何RAG系统,但我们展示了我们在Rufus——亚马逊的对话购物助手的背景下的结果。
-
- 图表
- 解决问题RAG systems have high latency due to interleaving reasoning and retrieval steps, and classification-based planning can limit the performance of large language models. This paper proposes a planner called REAPER to generate retrieval plans in conversational systems and aims to reduce latency and improve scalability.
- 关键思路REAPER is an LLM-based planner that generates retrieval plans by reasoning, rather than interleaving retrieval and reasoning steps. This approach reduces latency and improves scalability compared to Agent-based systems and classification-based planning.
- 其它亮点The paper presents a novel approach to generating retrieval plans in conversational systems, which improves the performance of large language models. The proposed REAPER planner shows significant gains in latency compared to Agent-based systems and classification-based planning. The experiments were conducted on Rufus, Amazon's conversational shopping assistant. The authors also demonstrate the ability of REAPER to handle new and unseen use cases. However, the paper does not provide any open-source code.
- Related work includes previous research on RAG systems, such as DensePhrases and REALM, as well as other conversational agents like Google's Meena and Facebook's Blender.


提问交流