- 简介尽管大型语言模型在代码生成方面取得了一定进展,但它们仍然难以满足复杂需求的程序。最近的研究利用了计划和解决分解来降低复杂性,并利用自检来改进生成的程序。然而,提前深入规划要求可能具有挑战性,并且测试需要准确才能实现自我改进。为此,我们提出了FunCoder,这是一个结合了分而治之策略和功能共识的代码生成框架。具体而言,FunCoder在代码生成过程中递归地将子函数作为较小的目标分支出来,并用树形层次结构表示。然后,这些子函数被组合以实现更复杂的目标。此外,我们通过识别程序行为的相似之处来指定函数,从而形成共识,减轻错误传播。FunCoder在HumanEval、MBPP、xCodeEval和MATH等任务中使用GPT-3.5和GPT-4,平均性能比最先进的方法提高了9.8%。此外,我们的方法在较小的模型上表现出优越性:使用FunCoder,StableCode-3b的性能比GPT-3.5提高了18.6%,在HumanEval上达到了GPT-4性能的97.7%。进一步分析表明,我们提出的动态函数分解能够处理复杂的需求,并且功能共识在正确性评估中优于自我测试。
-
- 图表
- 解决问题FunCoder: A Code Generation Framework with Functional Consensus
- 关键思路FunCoder incorporates the divide-and-conquer strategy with functional consensus to recursively branch off sub-functions as smaller goals during code generation, represented by a tree hierarchy. These sub-functions are then composited to attain more complex objectives.
- 其它亮点FunCoder outperforms state-of-the-art methods by +9.8% on average in HumanEval, MBPP, xCodeEval and MATH with GPT-3.5 and GPT-4. Moreover, our method demonstrates superiority on smaller models: With FunCoder, StableCode-3b surpasses GPT-3.5 by +18.6% and achieves 97.7% of GPT-4's performance on HumanEval. The proposed dynamic function decomposition is capable of handling complex requirements, and the functional consensus prevails over self-testing in correctness evaluation.
- Recent work utilizes plan-and-solve decomposition to decrease the complexity and leverage self-tests to refine the generated program. Other related research includes large language models in code generation.
NEW
提问交流
提交问题,平台邀请作者,轻松获得权威解答~
向作者提问

提问交流