Large Language Models Can Self-Correct with Minimal Effort

2024年05月23日
  • 简介
    本文提出了一种名为内在自我纠正的方法,指导大型语言模型(LLMs)在没有外部反馈的情况下验证和纠正其响应。不幸的是,研究得出结论,LLMs目前无法自我纠正推理。我们发现,一种简单而有效的验证方法可以释放LLMs的内在能力。即在问题中掩盖一个关键条件,添加当前的响应来构建一个验证问题,并预测条件以验证响应。条件可以是开放领域问题中的实体,也可以是数学问题中的数值,需要最少的努力(通过提示)来识别。我们提出了一个逐步识别和纠正(可能)错误响应的迭代验证-纠正框架,名为ProCo。我们在三个推理任务上进行实验。平均而言,使用GPT-3.5-Turbo作为后端LLM的ProCo在四个开放领域问答数据集上得到了$+6.8$的精确匹配,三个算术推理数据集上得到了$+14.1$的准确度,以及一个常识推理数据集上得到了$+9.6$的准确度,与自我纠正相比。
  • 图表
  • 解决问题
    ProCo: Progressive Verification and Correction of Large Language Models for Common Sense Reasoning and Arithmetic Word Problems
  • 关键思路
    The paper proposes an iterative verify-then-correct framework named ProCo, which uses a simple yet effective verification method to unleash the inherent capabilities of large language models (LLMs) in reasoning tasks. The framework involves masking a key condition in the question, adding the current response to construct a verification question, and predicting the condition to verify the response.
  • 其它亮点
    The ProCo framework is shown to outperform the intrinsic self-correct method on three reasoning tasks, achieving higher accuracy and exact match scores. The experiments were conducted on four open-domain question answering datasets, three arithmetic reasoning datasets, and a commonsense reasoning dataset using GPT-3.5-Turbo as the backend LLM. The paper also discusses the limitations of the framework and suggests future directions for research.
  • 相关研究
    Related research in this field includes studies on intrinsic self-correction methods for LLMs and other approaches to improving their reasoning abilities, such as prompt engineering and fine-tuning with task-specific data. Some relevant papers include 'Intrinsic Social Evaluation of Generative Models for Commonsense Reasoning' and 'GPT Understands, Too'.
许愿开讲
PDF
原文
点赞 收藏
向作者提问
NEW
分享到Link

提问交流

提交问题,平台邀请作者,轻松获得权威解答~

向作者提问