分享

How Transformers Learn to Plan via Multi-Token Prediction

热度