- 简介本文提出了一种名为内在自我纠正的方法,指导大型语言模型(LLMs)在没有外部反馈的情况下验证和纠正其响应。不幸的是,研究得出结论,LLMs目前无法自我纠正推理。我们发现,一种简单而有效的验证方法可以释放LLMs的内在能力。即在问题中掩盖一个关键条件,添加当前的响应来构建一个验证问题,并预测条件以验证响应。条件可以是开放领域问题中的实体,也可以是数学问题中的数值,需要最少的努力(通过提示)来识别。我们提出了一个逐步识别和纠正(可能)错误响应的迭代验证-纠正框架,名为ProCo。我们在三个推理任务上进行实验。平均而言,使用GPT-3.5-Turbo作为后端LLM的ProCo在四个开放领域问答数据集上得到了$+6.8$的精确匹配,三个算术推理数据集上得到了$+14.1$的准确度,以及一个常识推理数据集上得到了$+9.6$的准确度,与自我纠正相比。
- 图表
- 解决问题ProCo: Progressive Verification and Correction of Large Language Models for Common Sense Reasoning and Arithmetic Word Problems
- 关键思路The paper proposes an iterative verify-then-correct framework named ProCo, which uses a simple yet effective verification method to unleash the inherent capabilities of large language models (LLMs) in reasoning tasks. The framework involves masking a key condition in the question, adding the current response to construct a verification question, and predicting the condition to verify the response.
- 其它亮点The ProCo framework is shown to outperform the intrinsic self-correct method on three reasoning tasks, achieving higher accuracy and exact match scores. The experiments were conducted on four open-domain question answering datasets, three arithmetic reasoning datasets, and a commonsense reasoning dataset using GPT-3.5-Turbo as the backend LLM. The paper also discusses the limitations of the framework and suggests future directions for research.
- Related research in this field includes studies on intrinsic self-correction methods for LLMs and other approaches to improving their reasoning abilities, such as prompt engineering and fine-tuning with task-specific data. Some relevant papers include 'Intrinsic Social Evaluation of Generative Models for Commonsense Reasoning' and 'GPT Understands, Too'.


提问交流