分享

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

热度