泛化视角的语言模型指令元学习扩展

来自今天的爱可可AI前沿推介

[CL] OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

S Iyer, X V Lin, R Pasunuru, T Mihaylov, D Simig...
[Meta AI]

OPT-IML: 泛化视角的语言模型指令元学习扩展

要点:

收集了一个大型的基于指令微调的基准，包括来自8个数据集集合的2000个NLP任务；
建立了指令微调的多方面的权衡和最佳实践；
训练发布了基于OPT的OPT-IML 30B和175B指令微调模型。

摘要：
最近的工作表明，在通过指令描述的任务集合上微调大型预训练语言模型，即指令微调，可以改善其对未见任务的零样本和少样本的泛化。然而，对指令微调过程中做出的不同决策的性能权衡的理解有限。这些决定包括指令微调基准的规模和多样性，不同的任务采样策略，有演示和没有演示的微调，使用专门数据集进行推理和对话的训练，最后是微调目标本身。本文描述了在模型大小和基准大小进行扩展时，指令微调决策对下游任务性能的影响。本文创建了OPT-IML基准：2000个NLP任务的指令元学习(IML)的大型基准，从8个现有基准整合到任务类别中，并准备了一个评估框架来衡量三种类型的模型泛化：从完全留出类的任务，到从可见类到留出任务，再到从可见任务中留出实例。通过该框架的视角，本文首先提出适用于OPT-30B的指令微调决策的见解，并进一步利用这些见解来训练OPT-IML 30B和175B，它们是OPT的指令微调版本。OPT-IML在具有不同任务和输入格式(PromptSource、FLAN、Super-NaturalInstructions和UnifiedSKG)的四个不同评估基准上展示了两种规模的所有三个泛化能力。它不仅在所有基准上的表现都明显优于OPT，而且与在每个特定基准上微调的现有模型具有高度竞争力。在两个规模上发布OPT-IML，以及OPT-IML Bench评估框架。

Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diversity of the instruction-tuning benchmark, different task sampling strategies, fine-tuning with and without demonstrations, training using specialized datasets for reasoning and dialogue, and finally, the fine-tuning objectives themselves. In this paper, we characterize the effect of instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks. Through the lens of this framework, we first present insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT. OPT-IML demonstrates all three generalization abilities at both scales on four different evaluation benchmarks with diverse tasks and input formats -- PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG. Not only does it significantly outperform OPT on all benchmarks but is also highly competitive with existing models fine-tuned on each specific benchmark. We release OPT-IML at both scales, together with the OPT-IML Bench evaluation framework.

论文链接：https://arxiv.org/abs/2212.12017

内容中包含的图片若涉及版权问题，请及时与我们联系删除

泛化视角的语言模型指令元学习扩展

[CL] OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

评论