Z Allen-Zhu, YLi
[Meta AI & CMU]
深度学习的集成、知识蒸馏和自蒸馏的理解
要点:
-
独立训练的神经网络的集成,可以提高某些数据结构深度学习的测试精度; -
集成的优越性能可以蒸馏成一个单一的模型; -
自蒸馏是在执行"隐性集成+知识蒸馏"; -
知识蒸馏对深度学习中的随机特征映射不起作用。
一句话总结:
深度学习的集成和知识蒸馏与传统学习理论的工作机制不同,数据中的特殊结构需要神经网络集成来提高测试精度。
论文地址:https://openreview.net/forum?id=Uuf2q9TfXGA
We formally study how ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using knowledge distillation. We consider the challenging case where the ensemble is simply an average of the outputs of a few independently trained neural networks with the same architecture, trained using the same algorithm on the same data set, and they only differ by the random seeds used in the initialization. We show that ensemble/knowledge distillation in deep learning works very differently from traditional learning theory (such as boosting or NTKs). We develop a theory showing that when data has a structure we refer to as “multi-view”, then ensemble of independently trained neural networks can provably improve test accuracy, and such superior test accuracy can also be provably distilled into a single model. Our result sheds light on how ensemble works in deep learning in a way that is completely different from traditional theorems, and how the “dark knowledge” is hidden in the outputs of the ensemble and can be used in distillation.
内容中包含的图片若涉及版权问题,请及时与我们联系删除
评论
沙发等你来抢