LLM Augmented LLMs: Expanding Capabilities through Composition

简介

基于大量数据训练的拥有数十亿参数的基础模型已经在多个领域展示了非常重要的技能。然而，由于它们的单olithic结构，增强它们或赋予新技能是具有挑战性和昂贵的。另一方面，由于它们的适应能力，一些新的模型实例正在被训练以适应新的领域和任务。在这项工作中，我们研究了如何高效实用地组合现有基础模型和更具体模型以实现新的能力。为此，我们提出了CALM（Composition to Augment Language Models），引入了模型之间的交叉关注，以组合它们的表示并实现新的能力。CALM的显著特点是：（i）通过“重用”现有的LLM和一些额外的参数和数据来扩展新任务上的LLMs的规模，（ii）现有模型权重保持不变，因此保留现有能力，（iii）适用于不同的领域和设置。我们证明，将在低资源语言上训练的较小模型与PaLM2-S相结合，可以在翻译成英语和低资源语言的算术推理等任务上实现高达13％的绝对改进。同样，当PaLM2-S与特定于代码的模型相结合时，我们在代码生成和解释任务中看到了相对改进40％的基础模型——与完全微调的对应物相当。
图表
解决问题

CALM: Composition to Augment Language Models
关键思路

CALM proposes to efficiently and practically compose existing foundational language models with more specific models to enable newer capabilities through cross-attention between models, which scales up language models on new tasks by 're-using' existing models while preserving existing capabilities.
其它亮点

CALM is applied to diverse domains and settings and can be used to augment PaLM2-S with a smaller model trained on low-resource languages, resulting in an absolute improvement of up to 13% on tasks like translation into English and arithmetic reasoning for low-resource languages. When PaLM2-S is augmented with a code-specific model, a relative improvement of 40% over the base model for code generation and explanation tasks is achieved, which is on-par with fully fine-tuned counterparts.
相关研究

Recent related work includes the use of transfer learning and fine-tuning techniques to adapt pre-trained language models to new tasks, such as GPT-3 and T5.

LLM Augmented LLMs: Expanding Capabilities through Composition

评论