分享

Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning

热度