分享

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

热度