- 简介本文讨论了微调大规模预训练模型的高昂计算和内存成本。作为最流行的参数效率微调(PEFT)方法之一,LoRA通过微调较少参数的辅助低秩模型,提供了一种经济实惠的替代方案。虽然LoRA在每次迭代中显着降低了计算和内存需求,但广泛的实证证据表明,它的收敛速度比完全微调要慢得多,最终导致整体计算量增加并且通常会导致更差的测试性能。在本文中,我们对LoRA的初始化方法进行了深入研究,并表明,仔细的初始化(不改变架构和训练算法)可以显著提高效率和性能。特别是,我们引入了一种新的初始化方法,LoRA-GA(梯度近似的低秩适应),它在第一步将低秩矩阵乘积的梯度与完全微调的梯度对齐。我们广泛的实验表明,LoRA-GA实现了与完全微调相当的收敛速度(因此比普通的LoRA和各种最近的改进快得多),同时达到了可比或甚至更好的性能。例如,在使用T5-Base的GLUE数据集子集上,LoRA-GA平均优于LoRA 5.69%。在更大的模型(如Llama 2-7B)上,LoRA-GA在MT-bench,GSM8K和Human-eval上显示出0.34,11.52%和5.05%的性能提高。此外,我们观察到与普通LoRA相比,收敛速度提高了2-4倍,验证了它在加速收敛和提高模型性能方面的有效性。代码可在https://github.com/Outsider565/LoRA-GA上获得。
- 图表
- 解决问题LoRA, a popular Parameter-Efficient Fine-Tuning (PEFT) method, converges at a slower rate compared to full fine-tuning, leading to increased overall compute and often worse test performance. The paper aims to investigate the initialization method of LoRA and improve its efficiency and performance.
- 关键思路The paper introduces a novel initialization method, LoRA-GA, which aligns the gradients of low-rank matrix product with those of full fine-tuning at the first step. LoRA-GA achieves a convergence rate comparable to that of full fine-tuning and simultaneously attains comparable or even better performance.
- 其它亮点The paper conducts extensive experiments on various models and datasets, demonstrating that LoRA-GA outperforms vanilla LoRA and recent improvements. LoRA-GA shows performance improvements of up to 11.52% on MT-bench, GSM8K, and Human-eval. The paper provides open-source code on GitHub. The effectiveness of LoRA-GA in accelerating convergence and enhancing model performance is validated.
- Recent related studies include various Parameter-Efficient Fine-Tuning (PEFT) methods such as AdaFill, PET, and SVD-Finetune.
沙发等你来抢
去评论
评论
沙发等你来抢