分享

Generative Verifiers: Reward Modeling as Next-Token Prediction

热度