Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

向作者提问

NEW

简介

尽管大型语言模型在代码生成方面取得了一定进展，但它们仍然难以满足复杂需求的程序。最近的研究利用了计划和解决分解来降低复杂性，并利用自检来改进生成的程序。然而，提前深入规划要求可能具有挑战性，并且测试需要准确才能实现自我改进。为此，我们提出了FunCoder，这是一个结合了分而治之策略和功能共识的代码生成框架。具体而言，FunCoder在代码生成过程中递归地将子函数作为较小的目标分支出来，并用树形层次结构表示。然后，这些子函数被组合以实现更复杂的目标。此外，我们通过识别程序行为的相似之处来指定函数，从而形成共识，减轻错误传播。FunCoder在HumanEval、MBPP、xCodeEval和MATH等任务中使用GPT-3.5和GPT-4，平均性能比最先进的方法提高了9.8%。此外，我们的方法在较小的模型上表现出优越性：使用FunCoder，StableCode-3b的性能比GPT-3.5提高了18.6%，在HumanEval上达到了GPT-4性能的97.7%。进一步分析表明，我们提出的动态函数分解能够处理复杂的需求，并且功能共识在正确性评估中优于自我测试。
作者讲解

目前尚无作者解读视频，你可点击下方【许愿开讲】按钮，许愿作者开讲~
图表
解决问题

FunCoder: A Code Generation Framework with Functional Consensus
关键思路

FunCoder incorporates the divide-and-conquer strategy with functional consensus to recursively branch off sub-functions as smaller goals during code generation, represented by a tree hierarchy. These sub-functions are then composited to attain more complex objectives.
其它亮点

FunCoder outperforms state-of-the-art methods by +9.8% on average in HumanEval, MBPP, xCodeEval and MATH with GPT-3.5 and GPT-4. Moreover, our method demonstrates superiority on smaller models: With FunCoder, StableCode-3b surpasses GPT-3.5 by +18.6% and achieves 97.7% of GPT-4's performance on HumanEval. The proposed dynamic function decomposition is capable of handling complex requirements, and the functional consensus prevails over self-testing in correctness evaluation.
相关研究

Recent work utilizes plan-and-solve decomposition to decrease the complexity and leverage self-tests to refine the generated program. Other related research includes large language models in code generation.

许愿开讲

PDF

原文

点赞收藏

向作者提问

NEW

分享到Link

提问交流

提交问题，平台邀请作者，轻松获得权威解答～

向作者提问