Large Language Models Can Self-Correct with Minimal Effort

向作者提问

NEW

简介

本文提出了一种名为内在自我纠正的方法，指导大型语言模型（LLMs）在没有外部反馈的情况下验证和纠正其响应。不幸的是，研究得出结论，LLMs目前无法自我纠正推理。我们发现，一种简单而有效的验证方法可以释放LLMs的内在能力。即在问题中掩盖一个关键条件，添加当前的响应来构建一个验证问题，并预测条件以验证响应。条件可以是开放领域问题中的实体，也可以是数学问题中的数值，需要最少的努力（通过提示）来识别。我们提出了一个逐步识别和纠正（可能）错误响应的迭代验证-纠正框架，名为ProCo。我们在三个推理任务上进行实验。平均而言，使用GPT-3.5-Turbo作为后端LLM的ProCo在四个开放领域问答数据集上得到了$+6.8$的精确匹配，三个算术推理数据集上得到了$+14.1$的准确度，以及一个常识推理数据集上得到了$+9.6$的准确度，与自我纠正相比。
作者讲解

目前尚无作者解读视频，你可点击下方【许愿开讲】按钮，许愿作者开讲~
图表
解决问题

ProCo: Progressive Verification and Correction of Large Language Models for Common Sense Reasoning and Arithmetic Word Problems
关键思路

The paper proposes an iterative verify-then-correct framework named ProCo, which uses a simple yet effective verification method to unleash the inherent capabilities of large language models (LLMs) in reasoning tasks. The framework involves masking a key condition in the question, adding the current response to construct a verification question, and predicting the condition to verify the response.
其它亮点

The ProCo framework is shown to outperform the intrinsic self-correct method on three reasoning tasks, achieving higher accuracy and exact match scores. The experiments were conducted on four open-domain question answering datasets, three arithmetic reasoning datasets, and a commonsense reasoning dataset using GPT-3.5-Turbo as the backend LLM. The paper also discusses the limitations of the framework and suggests future directions for research.
相关研究

Related research in this field includes studies on intrinsic self-correction methods for LLMs and other approaches to improving their reasoning abilities, such as prompt engineering and fine-tuning with task-specific data. Some relevant papers include 'Intrinsic Social Evaluation of Generative Models for Commonsense Reasoning' and 'GPT Understands, Too'.

许愿开讲

PDF

原文

点赞收藏

向作者提问

NEW

分享到Link

提问交流

提交问题，平台邀请作者，轻松获得权威解答～

向作者提问