PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

向作者提问

NEW

简介

我们介绍了PutnamBench，这是一个新的多语言基准，用于评估神经定理证明器解决竞赛数学问题的能力。PutnamBench由1697个手工构建的形式化描述组成，这些形式化描述源自北美顶尖的本科级数学竞赛William Lowell Putnam Mathematical Competition的640个定理。所有定理都有Lean 4和Isabelle的形式化描述；其中的一个重要子集也有Coq的形式化描述。证明这些定理需要重要的问题解决能力和广泛的本科数学课程教授的知识。我们使用PutnamBench评估了几种已建立的神经和符号定理证明器。这些方法只能解决少数PutnamBench问题，从而将该基准确定为神经定理证明研究的困难开放挑战。PutnamBench可在https://github.com/trishullab/PutnamBench上获取。
作者讲解

目前尚无作者解读视频，你可点击下方【许愿开讲】按钮，许愿作者开讲~
图表
解决问题

PutnamBench: A Multilingual Benchmark for Theorem Proving in Undergraduate Mathematics
关键思路

The paper introduces PutnamBench, a new benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. The benchmark consists of 1697 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, which requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses.
其它亮点

The paper evaluates several established neural and symbolic theorem-provers using PutnamBench, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available on Github. The paper also includes detailed descriptions of the formalizations in Lean 4, Isabelle, and Coq.
相关研究

Recent related work includes 'Neural Theorem Proving with Type Theory and Homotopy Type Theory' and 'Automated Reasoning in Higher-Order Logic Using the Tactic Language'.

许愿开讲

PDF

原文

点赞收藏

向作者提问

NEW

分享到Link

提问交流

提交问题，平台邀请作者，轻松获得权威解答～

向作者提问