分享

No More Adam: Learning Rate Scaling at Initialization is All You Need

热度