来自今天的爱可可AI前沿推介
[CL] Large Language Models Are Reasoning Teachers
N Ho, L Schmid, S Yun
[KAIST]
大型语言模型也是推理教师
要点:
1.提出了微调CoT(Fine-tune-CoT),利用非常大的语言模型来生成推理样本,并通过微调来教授更小的模型;
2.微调CoT可以在小型模型中实现实质性的推理能力,优于之前基于提示的基线; 3.微调CoT是一种无关任务的方法,可以在小型模型中获得推理性能。
摘要:
语言模型(LM)在下游任务上表现出非凡的性能,采用上下文范例或人工指令。最近的工作表明,思维链(CoT)提示可以引出模型逐步解决复杂的推理任务。然而,基于提示的CoT方法的有效性仅限于非常大的LM,如GPT-3(175B),从而限制了可部署性。本文重新审视了微调方法,以便在较小的语言模型中实现复杂的推理,并进行了优化,以高效地执行特定任务。本文提出了Fine-tune-CoT,这是一种利用非常大的LM的能力生成推理样本并通过微调教授较小模型的方法。本文评估了在各种复杂任务和模型尺寸的公开语言模型上的方法。微调CoT可以在小型模型中实现实质性的推理能力,而之前基于提示的基线则表现出近乎随机的性能。学生模型在某些任务中甚至可以优于教师,同时将模型尺寸要求降低几个数量级。本文进行了广泛的消融和抽样研究,以了解学生模型的推理能力。本文还确定了CoT并行微调工作中被忽视的几个重要细微差别,并在分析中加以解决。
Language models (LMs) have demonstrated remarkable performance on downstream tasks, using in-context exemplars or human instructions. Recent works have shown that chain-of-thought (CoT) prompting can elicit models to solve complex reasoning tasks, step-by-step. However, the efficacy of prompt-based CoT methods is restricted to very large LMs such as GPT-3 (175B), thus limiting deployability. In this paper, we revisit the fine-tuning approach to enable complex reasoning in smaller LMs, optimized to efficiently perform a specific task. We propose Fine-tune-CoT, a method that leverages the capabilities of very large LMs to generate reasoning samples and teach smaller models via fine-tuning. We evaluate our method on publicly available LMs across a wide range of complex tasks and model sizes. We find that Fine-tune-CoT enables substantial reasoning capability in small models, whereas previous prompt-based baselines exhibit near-random performance. Student models can even outperform the teacher in some tasks while reducing model size requirements by several orders of magnitude. We conduct extensive ablations and sample studies to understand the reasoning capabilities of student models. We also identify several important nuances that have been overlooked in concurrent fine-tuning works on CoT and address them in our analysis.
论文链接:https://arxiv.org/abs/2212.10071
内容中包含的图片若涉及版权问题,请及时与我们联系删除
评论
沙发等你来抢