MathScale: Scaling Instruction Tuning for Mathematical Reasoning

向作者提问

NEW

简介

大型语言模型（LLMs）在问题解决方面表现出了出色的能力。然而，它们在解决数学问题方面的熟练程度仍然不足。我们提出了MathScale，一种简单且可扩展的方法，使用前沿的LLMs（例如GPT-3.5）创建高质量的数学推理数据。受人类数学学习中的认知机制启发，它首先从种子数学问题中提取主题和知识点，然后构建一个概念图，随后用于生成新的数学问题。MathScale在我们生成的数学数据集的大小轴上展现出了有效的可扩展性。因此，我们创建了一个包含200万个数学问题-答案对的数学推理数据集（MathScaleQA）。为了全面评估LLMs的数学推理能力，我们构建了Math Word Problems基准测试（{\sc MwpBench}），这是一个涵盖K-12、大学和竞赛级别数学问题的10个数据集（包括GSM8K和MATH）。我们将MathScaleQA应用于开源LLMs（例如LLaMA-2和Mistral）的微调中，从而显著提高了数学推理能力。在{\sc MwpBench}上评估，MathScale-7B在所有数据集上均取得了最先进的性能，其微平均准确率和宏平均准确率分别比同等大小的最佳对手提高了42.9％和43.7％。
作者讲解

目前尚无作者解读视频，你可点击下方【许愿开讲】按钮，许愿作者开讲~
图表
解决问题

MathScale: A Simple and Scalable Method for Generating High-Quality Math Word Problems with LLMs
关键思路

The paper proposes MathScale, a method to create high-quality mathematical reasoning data using frontier LLMs by extracting topics and knowledge points from seed math questions and building a concept graph to generate new math questions.
其它亮点

The MathScaleQA dataset containing two million math question-answer pairs is created using MathScale. The paper also constructs MwpBench, a benchmark of Math Word Problems, and applies MathScaleQA to fine-tune open-source LLMs resulting in significantly improved capabilities in mathematical reasoning. MathScale-7B achieves state-of-the-art performance across all datasets, surpassing its best peers of equivalent size by 42.9% in micro average accuracy and 43.7% in macro average accuracy, respectively.
相关研究

Recent related studies in this field include 'Solving Math Word Problems with Large Language Models' and 'MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms'.

许愿开讲

PDF

原文

点赞收藏

向作者提问

NEW

分享到Link

提问交流

提交问题，平台邀请作者，轻松获得权威解答～

向作者提问