ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems

2023年11月16日
  • 简介
    我们传统上评估检索增强生成(RAG)系统是依赖于手工注释输入查询、要检索的段落和要生成的响应。我们引入了ARES,一种自动化的RAG评估系统,用于评估RAG系统在上下文相关性、答案忠实度和答案相关性等维度上的表现。使用合成训练数据,ARES微调轻量级LM评估器以评估单个RAG组件的质量。为了减少潜在的预测误差,ARES利用少量的人工注释数据点进行预测驱动推理(PPI)。在KILT和SuperGLUE中的六个不同的知识密集型任务中,ARES在评估过程中使用了数百个人工注释,准确评估了RAG系统。此外,ARES评估器即使在更改评估RAG系统中使用的查询和/或文档类型后,仍然保持有效,证明其在领域转移方面的准确性。我们提供了我们的数据集和代码以进行复制和部署,网址为https://github.com/stanford-futuredata/ARES。
  • 图表
  • 解决问题
    ARES addresses the problem of evaluating retrieval-augmented generation (RAG) systems using hand annotations, and proposes an automated system that uses synthetic training data and a small set of human-annotated datapoints for prediction-powered inference (PPI).
  • 关键思路
    The key idea of ARES is to finetune lightweight LM judges to assess the quality of individual RAG components, and to use PPI to mitigate potential prediction errors. ARES judges remain effective across domain shifts, proving accurate even after changing the type of queries and/or documents used in the evaluated RAG systems.
  • 其它亮点
    The experiments in the paper show that ARES accurately evaluates RAG systems across six different knowledge-intensive tasks in KILT and SuperGLUE, using a few hundred human annotations during evaluation. The paper also makes the datasets and code for replication and deployment available on GitHub. ARES provides a promising automated solution for evaluating RAG systems, and its approach of using synthetic training data and PPI can be extended to other evaluation tasks in NLP.
  • 相关研究
    Related work in this area includes previous research on evaluating RAG systems using hand annotations, as well as work on automated evaluation methods for other NLP tasks. Some relevant papers include 'Evaluating the Factual Consistency of Abstractive Text Summarization' by Zhao et al., 'BLEURT: Learning Robust Metrics for Text Generation' by Sellam et al., and 'BERTScore: Evaluating Text Generation with BERT' by Zhang et al.
PDF
原文
点赞 收藏 评论 分享到Link

沙发等你来抢

去评论