- 简介在过去几十年中,研究人员在自动化软件开发过程方面取得了显著进展。最近,大型语言模型(LLM)的进展对开发过程产生了重大影响,开发人员可以使用基于LLM的编程助手实现自动编码。然而,软件工程除了编码之外还涉及程序改进的过程,特别是为了实现软件维护(例如修复错误)和软件演化(例如添加功能)。在本文中,我们提出了一种自动化方法来解决GitHub问题,以实现程序改进。在我们的方法AutoCodeRover中,LLM与复杂的代码搜索功能相结合,最终导致程序修改或补丁。与AI研究人员和从业者最近的LLM代理方法不同,我们的展望更加注重软件工程。我们使用程序表示(抽象语法树)而不是将软件项目视为仅仅是文件的集合。我们的代码搜索利用类/方法的程序结构形式来增强LLM对问题根本原因的理解,并通过迭代搜索有效地检索上下文。使用基于光谱的故障定位测试进一步锐化上下文,只要测试套件可用。在由300个真实GitHub问题组成的SWE-bench-lite上的实验显示,在解决GitHub问题方面具有更高的效力(在SWE-bench-lite上为22-23%)。在包含2294个GitHub问题的完整SWE-bench上,AutoCodeRover解决了约16%的问题,这比最近报道的Cognition Labs的AI软件工程师Devin的效力更高,而时间相当。我们认为,我们的工作流程实现了自主软件工程,在未来,LLM生成的自动生成代码可以自主改进。
- 图表
- 解决问题AutoCodeRover: Autonomous Program Improvement Using Large Language Models and Sophisticated Code Search
- 关键思路The paper proposes an automated approach for solving GitHub issues using Large Language Models (LLMs) and sophisticated code search capabilities. The approach, called AutoCodeRover, works on a program representation (abstract syntax tree) and exploits the program structure to enhance LLM's understanding of the issue's root cause. The use of spectrum based fault localization using tests further sharpens the context. The proposed approach enables autonomous software engineering, where auto-generated code from LLMs can be autonomously improved.
- 其它亮点The experiments on SWE-bench-lite and the full SWE-bench consisting of 2294 GitHub issues show increased efficacy in solving GitHub issues. AutoCodeRover solved around 16% of issues on the full SWE-bench, which is higher than the efficacy of the recently reported AI software engineer Devin from Cognition Labs, while taking time comparable to Devin. The proposed approach is more software engineering oriented and works on a program representation as opposed to viewing a software project as a mere collection of files.
- Related work includes recent progress in Large Language Models (LLMs) and their impact on the development process, as well as previous studies on automated software engineering and program repair. Some related papers include 'CodeBERT: A Pre-Trained Model for Programming and Natural Language Processing' and 'DeepFix: Fixing Common C Language Errors by Deep Learning'.
沙发等你来抢
去评论
评论
沙发等你来抢