System-1.x: Learning to Balance Fast and Slow Planning with Language Models

简介

语言模型可以用于解决长期规划问题，有两种不同的模式：快速的“系统1”模式，直接生成计划而不需要任何显式的搜索或回溯；以及缓慢的“系统2”模式，通过明确搜索可能的行动来逐步规划。虽然系统2通常更有效，但计算成本也更高，使其无法用于长期计划或大型行动空间。此外，单独使用系统1或2忽略了用户的最终目标，无法提供控制模型行为的方法。为此，我们提出了System-1.x Planner，这是一个具有LLMs的可控规划框架，能够生成混合计划并根据问题难度在两种规划模式之间平衡。System-1.x由三个部分组成：（i）控制器，（ii）系统1规划器和（iii）系统2规划器。根据用户指定的混合化因子（x）来控制System-1和2之间的混合，控制器将问题分解为子目标，并将其分类为易于由System-1或2解决的问题或难以解决的问题。我们在单个基础LLM之上微调了所有三个组件，只需要搜索跟踪作为监督。两个不同的规划任务——迷宫导航和块世界的实验表明，我们的System-1.x Planner优于System-1 Planner、训练用于近似A*搜索的System-2 Planner以及符号规划器（A*）。我们展示了我们的规划器的以下关键属性：（1）可控性：增加混合化因子（例如，System-1.75 vs 1.5）会进行更多搜索，提高性能；（2）灵活性：通过构建一个神经符号变体，具有神经系统1和符号系统2，我们可以使用现有的符号方法；（3）泛化性：通过能够从不同的搜索算法中学习，我们的方法对搜索算法的选择具有鲁棒性。

图表

解决问题

System-1.x Planner: Controllable and Generalizable Planning with LLMs

关键思路

The paper proposes a controllable planning framework with LLMs that generates hybrid plans by balancing between fast 'System-1' and slow 'System-2' modes based on the difficulty of the problem. The framework consists of a controller, a System-1 Planner, and a System-2 Planner, all fine-tuned on a single base LLM.

其它亮点

The System-1.x Planner outperforms System-1 and System-2 Planners trained to approximate A* search, as well as a symbolic planner (A*), in two diverse planning tasks. The planner demonstrates controllability, flexibility, and generalizability, and can learn from different search algorithms.

System-1.x: Learning to Balance Fast and Slow Planning with Language Models

评论