来自今天的爱可可AI前沿推介。

[LG] Editing Models with Task Arithmetic

G Ilharco, M T Ribeiro, M Wortsman, S Gururangan, L Schmidt, H Hajishirzi, A Farhadi
[University of Washington & Microsoft Research]

基于任务向量算术的模型编辑

简介:提出一种用任务向量编辑预训练模型的新方法。任务向量是通过将预训练模型权重减去该模型在任务中微调后权重而产生的。任务向量上的算术运算,如取负和相加,使得用户可修改和组合预训练模型的行为,也可以用来创建新的模型,在多个任务或由类比关系连接的任务上有更好的性能。这种方法高效易用,可实现无需额外推理成本的模型编辑。

摘要:在开发机器学习系统时,改变预训练模型的行为方式——例如,改善其在下游任务上的表现或减轻在预训练期间学到的偏差——是一种常见的做法。本文提出一种以任务向量为中心引导神经网络行为的新范式。一个任务向量在预训练模型的权重空间中指定了一个方向,在这个方向上移动可以提高任务上的性能。通过将预训练模型的权重减去同一模型在任务中微调后的权重来建立任务向量。这些任务向量可以通过取负和相加等算术运算进行修改并结合在一起,由此产生的模型的行为也相应地被引导。某个任务向量取负会降低目标任务的性能,而模型在控制任务上的行为变化很小。此外,将任务向量相加,可同时提高多个任务的性能。最后,当任务由"A对B就像C对D一样"的类比关系联系起来时,将其中三个任务的任务向量结合起来可以提高第四个任务的性能,即使第四个任务的数据没参与训练。对几个模型、模态和任务的实验表明,任务算术是一种简单、高效和有效的编辑模型的方法。

Changing how pre-trained models behave -- e.g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when developing machine learning systems. In this work, we propose a new paradigm for steering the behavior of neural networks, centered around         extit{task vectors}. A task vector specifies a direction in the weight space of a pre-trained model, such that movement in that direction improves performance on the task. We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task. We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition, and the behavior of the resulting model is steered accordingly. Negating a task vector decreases performance on the target task, with little change in model behavior on control tasks. Moreover, adding task vectors together can improve performance on multiple tasks at once. Finally, when tasks are linked by an analogy relationship of the form ``A is to B as C is to D", combining task vectors from three of the tasks can improve performance on the fourth, even when no data from the fourth task is used for training. Overall, our experiments with several models, modalities and tasks show that task arithmetic is a simple, efficient and effective way of editing models.

论文链接:https://arxiv.org/abs/2212.04089

图片

图片

图片

图片

 

内容中包含的图片若涉及版权问题,请及时与我们联系删除