MathScale: Scaling Instruction Tuning for Mathematical Reasoning

2024年03月05日
  • 简介
    大型语言模型(LLMs)在问题解决方面表现出了出色的能力。然而,它们在解决数学问题方面的熟练程度仍然不足。我们提出了MathScale,一种简单且可扩展的方法,使用前沿的LLMs(例如GPT-3.5)创建高质量的数学推理数据。受人类数学学习中的认知机制启发,它首先从种子数学问题中提取主题和知识点,然后构建一个概念图,随后用于生成新的数学问题。MathScale在我们生成的数学数据集的大小轴上展现出了有效的可扩展性。因此,我们创建了一个包含200万个数学问题-答案对的数学推理数据集(MathScaleQA)。为了全面评估LLMs的数学推理能力,我们构建了Math Word Problems基准测试({\sc MwpBench}),这是一个涵盖K-12、大学和竞赛级别数学问题的10个数据集(包括GSM8K和MATH)。我们将MathScaleQA应用于开源LLMs(例如LLaMA-2和Mistral)的微调中,从而显著提高了数学推理能力。在{\sc MwpBench}上评估,MathScale-7B在所有数据集上均取得了最先进的性能,其微平均准确率和宏平均准确率分别比同等大小的最佳对手提高了42.9%和43.7%。
  • 作者讲解
  • 图表
  • 解决问题
    MathScale: A Simple and Scalable Method for Generating High-Quality Math Word Problems with LLMs
  • 关键思路
    The paper proposes MathScale, a method to create high-quality mathematical reasoning data using frontier LLMs by extracting topics and knowledge points from seed math questions and building a concept graph to generate new math questions.
  • 其它亮点
    The MathScaleQA dataset containing two million math question-answer pairs is created using MathScale. The paper also constructs MwpBench, a benchmark of Math Word Problems, and applies MathScaleQA to fine-tune open-source LLMs resulting in significantly improved capabilities in mathematical reasoning. MathScale-7B achieves state-of-the-art performance across all datasets, surpassing its best peers of equivalent size by 42.9% in micro average accuracy and 43.7% in macro average accuracy, respectively.
  • 相关研究
    Recent related studies in this field include 'Solving Math Word Problems with Large Language Models' and 'MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms'.
许愿开讲
PDF
原文
点赞 收藏
向作者提问
NEW
分享到Link

提问交流

提交问题,平台邀请作者,轻松获得权威解答~

向作者提问