大型代码语言模型的项目(Repo)级提示生成

来自今日爱可可AI前沿推介。

[CL] Repository-Level Prompt Generation for Large Language Models of Code

D Shrivastava, H Larochelle, D Tarlow
[Google Research]

大型代码语言模型的项目(Repo)级提示生成

要点:

提出代码库(Repo)级提示生成器(RLPG)，学习生成样本特定提示，无需访问LLM权重；
RLPG使用一组代码库级提示建议来结合域知识实现提示设计过程，既包含代码库结构，也包含来自代码库中所有文件的相关上下文；
在单行代码自动补全任务中，展示了由所提出的提示建议构建的oracle相对于Codex可以提高最多36％的相对改进。

摘要：
随着代码大型语言模型(LLM)的成功及其作为代码助理的使用(例如GitHub Copilot中使用的Codex)，在提示设计过程中引入域特定知识的技术变得很重要。本文提出名为Repo级提示生成器的框架，该框架学习使用提示建议生成特定于样本的提示。提示建议从整个代码库中获取上下文，从而结合了代码库的结构和其他相关文件(例如imports、父类文件)的上下文。该技术不需要任何访问LLM的权重，因此适用于只通过黑盒访问LLM的情况。使用从Google Code archives中提取的代码库进行单行代码自动补齐任务的实验证明，根据提示建议构建的oracle相比Codex具有36%的相对改进，这表明了这些建议的质量。当训练模型来预测提示建议时，可以比Codex和其他基线实现显著的性能提升。

https://arxiv.org/abs/2206.12839

With the success of large language models (LLMs) of code and their use as code assistants (e.g. Codex used in GitHub Copilot), techniques for introducing domain-specific knowledge in the prompt design process become important. In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using prompt proposals. The prompt proposals take context from the entire repository, thereby incorporating both the structure of the repository and the context from other relevant files (e.g. imports, parent class files). Our technique doesn't require any access to the weights of the LLM, making it applicable in cases where we only have black-box access to the LLM. We conduct experiments on the task of single-line code-autocompletion using code repositories taken from Google Code archives. We demonstrate that an oracle constructed from our prompt proposals gives a remarkably high relative improvement of 36% over Codex, showing the quality of these proposals. Further, we show that when we train a model to predict a prompt proposal, we can achieve significant performance gains over Codex and other baselines.

内容中包含的图片若涉及版权问题，请及时与我们联系删除

大型代码语言模型的项目(Repo)级提示生成

评论