- 简介我们介绍了PutnamBench,这是一个新的多语言基准,用于评估神经定理证明器解决竞赛数学问题的能力。PutnamBench由1697个手工构建的形式化描述组成,这些形式化描述源自北美顶尖的本科级数学竞赛William Lowell Putnam Mathematical Competition的640个定理。所有定理都有Lean 4和Isabelle的形式化描述;其中的一个重要子集也有Coq的形式化描述。证明这些定理需要重要的问题解决能力和广泛的本科数学课程教授的知识。我们使用PutnamBench评估了几种已建立的神经和符号定理证明器。这些方法只能解决少数PutnamBench问题,从而将该基准确定为神经定理证明研究的困难开放挑战。PutnamBench可在https://github.com/trishullab/PutnamBench上获取。
- 图表
- 解决问题PutnamBench: A Multilingual Benchmark for Theorem Proving in Undergraduate Mathematics
- 关键思路The paper introduces PutnamBench, a new benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. The benchmark consists of 1697 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, which requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses.
- 其它亮点The paper evaluates several established neural and symbolic theorem-provers using PutnamBench, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available on Github. The paper also includes detailed descriptions of the formalizations in Lean 4, Isabelle, and Coq.
- Recent related work includes 'Neural Theorem Proving with Type Theory and Homotopy Type Theory' and 'Automated Reasoning in Higher-Order Logic Using the Tactic Language'.
沙发等你来抢
去评论
评论
沙发等你来抢