The “era of experience” described by David Silver and Richard Sutton represents a fundamental paradigm shift in artificial intelligence development. This proposed new era follows what Silver characterizes as the “era of human data,” where systems like large language models are trained primarily on vast datasets of human-generated content. Prior to this, AI development progressed through earlier stages: from rule-based systems programmed with explicit logic, through the supervised learning era where models were trained on labeled examples, to the current paradigm where models consume enormous quantities of human-generated text, code, images, and other data. Each stage expanded capabilities but maintained fundamental limitations tied to human knowledge boundaries. The proposed transition now moves from systems trained on human data toward systems that learn autonomously through their own experiences, breaking beyond these inherent constraints. Agentic graph systems, particularly when integrated with reinforcement learning for strategic tool use, create the ideal architectural foundation for implementing this vision.
David Silver 和 Richard Sutton 描述的“体验时代”代表了人工智能发展的根本范式转变。这个拟议的新时代遵循了 Silver 所描述的“人类数据时代”,在这个时代,大型语言模型等系统主要在人类生成内容的大量数据集上进行训练。在此之前,AI 开发经历了早期阶段:从使用显式逻辑编程的基于规则的系统,到模型在标记样本上训练的监督学习时代,再到当前模型使用大量人工生成的文本、代码、图像和其他数据的范式。每个阶段都扩展了功能,但保持了与人类知识边界相关的基本限制。拟议的过渡现在从基于人类数据训练的系统转向通过自身经验自主学习的系统,突破了这些固有的限制。代理图系统,特别是当与强化学习集成以使用战略工具时,为实现这一愿景创造了理想的架构基础。

绘制了主流人工智能范式的年表。y 轴表示该领域在强化学习上投入的精力和计算占比。
At the core of Silver and Sutton’s thesis is the recognition that AI systems trained exclusively on human-generated data face an inherent ceiling: they cannot exceed the knowledge contained in their training data. This limitation manifests in current large language models, which, despite their impressive capabilities, fundamentally cannot discover truly novel approaches that extend beyond human conception. The proposed solution is a transition to systems that generate their own experiences through environmental interaction, learn from consequences rather than examples, and develop reasoning patterns not constrained by human cognition.
Silver 和 Sutton 论文的核心是认识到,完全使用人类生成的数据进行训练的 AI 系统面临着一个固有的天花板:它们不能超过其训练数据中包含的知识。这种局限性体现在当前的大型语言模型中,尽管它们的能力令人印象深刻,但从根本上无法发现超越人类概念的真正新颖的方法。提出的解决方案是过渡到通过环境交互生成自己的经验的系统,从后果而不是例子中学习,并发展出不受人类认知限制的推理模式。
Agentic graph systems provide the structural foundation necessary for this experience-based learning approach. Unlike traditional vector-based architectures that flatten relationships and struggle with contextual understanding, graph-based systems organize knowledge as interconnected networks that mirror the relational nature of reality. This fundamental architectural distinction enables several capabilities essential for the era of experience.
代理图系统为这种基于经验的学习方法提供了必要的结构基础。与传统的基于向量的架构不同,这些架构会扁平化关系并努力进行上下文理解,而基于图的系统将知识组织为反映现实关系性质的互连网络。这种基本的架构差异实现了体验时代所必需的多项功能。
Knowledge representation through graphs enables continuous evolution based on environmental feedback. The graph structure allows systems to modify their own knowledge based on experience describes as “self-evolution” — the ability for AI systems to continuously enhance their own capabilities through structured, autonomous improvement cycles. This directly supports Silver’s vision of systems that can learn beyond human knowledge limitations by updating their understanding based on environmental interactions rather than static training data.
通过图表表示知识,可以根据环境反馈实现持续进化。图结构允许系统根据经验修改自己的知识,这些知识被描述为“自我进化”,即 AI 系统通过结构化、自主的改进周期不断增强自身能力的能力。这直接支持 Silver 的愿景,即系统可以通过根据环境交互而不是静态训练数据更新其理解来超越人类知识限制进行学习。
Graph-based memory systems address what Silver identifies as a critical requirement for experience-based learning: the ability to maintain “streams of experience” rather than episodic interactions. Traditional systems struggle with contextual amnesia, losing coherent understanding across multiple interactions. In contrast, graph memory preserves experiential context through temporal and causal relationships, enabling long-term learning across extended timeframes. This capability allows systems to pursue distant goals and learn from extended experiences rather than optimizing for immediate responses — precisely what Silver identifies as necessary for systems that can develop sophisticated strategies over time.
基于图的记忆系统解决了 Silver 认为基于体验的学习的关键要求:保持“经验流”而不是情节交互的能力。传统系统与情境遗忘症作斗争,在多次交互中失去连贯的理解。相比之下,图记忆通过时间和因果关系保留经验背景,从而能够在更长的时间范围内进行长期学习。这种能力使系统能够追求遥远的目标并从扩展的经验中学习,而不是针对即时响应进行优化——这正是 Silver 认为的系统可以随着时间的推移制定复杂策略所必需的。
Dynamic inference graphs facilitate the development of non-human reasoning patterns through recursive decomposition and multi-path exploration. Where traditional reasoning approaches often imitate human thought processes, graph-based inference enables reasoning structures that adapt dynamically to problem complexity. This implements Silver’s insight that breaking beyond human capabilities requires developing reasoning mechanisms not constrained by human cognitive patterns. Just as AlphaZero developed chess strategies that appeared alien to human grandmasters, graph-based reasoning systems can discover novel approaches to complex problems by exploring multiple pathways simultaneously and recursively decomposing problems in ways humans might not naturally consider.
动态推理图通过递归分解和多路径探索促进了非人类推理模式的发展。传统推理方法经常模仿人类的思维过程,而基于图形的推理使推理结构能够动态适应问题的复杂性。这实现了 Silver 的见解,即突破人类能力需要开发不受人类认知模式限制的推理机制。正如 AlphaZero 开发的国际象棋策略对人类特级大师来说似乎很陌生一样,基于图的推理系统可以通过同时探索多种途径并以人类可能不会自然考虑的方式递归分解问题来发现解决复杂问题的新方法。
The tool orchestration component of agentic graph architecture, when enhanced with reinforcement learning methodologies like those in ReTool, creates systems that can learn optimal tool usage patterns based on outcomes rather than human examples. Traditional approaches typically rely on supervised learning from human demonstrations, inheriting human limitations and biases. ReTool’s outcome-driven approach instead enables models to discover when and how to use tools based on task completion success, implementing Silver’s principle of learning from consequences rather than examples.
代理图架构的工具编排组件,当使用 ReTool 中的强化学习方法进行增强时,可以创建可以根据结果而不是人类示例学习最佳工具使用模式的系统。传统方法通常依赖于从人工演示中进行的监督学习,继承了人类的局限性和偏见。相反,ReTool 的结果驱动方法使模型能够根据任务完成成功来发现何时以及如何使用工具,从而实施了 Silver 从结果而不是示例中学习的原则。

Code execution through interpreters provides a powerful form of environmental grounding — creating the “grounded rewards” Silver identifies as necessary for surpassing human capabilities. Unlike human feedback which evaluates outputs based on subjective judgment, code execution provides objectively verifiable results. This computational grounding creates a validation mechanism for testing hypotheses and verifying reasoning that isn’t limited by human evaluative capabilities. When combined with graph-based knowledge representation, this creates systems that can discover novel strategies and verify their effectiveness through real-world feedback rather than human approval.
通过解释器执行代码提供了一种强大的环境基础形式 — 创造 Silver 认为超越人类能力所必需的“接地气奖励”。与根据主观判断评估输出的人工反馈不同,代码执行提供客观可验证的结果。这种计算基础创建了一个验证机制,用于测试假设和验证推理,而该机制不受人类评估能力的限制。当与基于图谱的知识表示相结合时,这创建的系统可以发现新颖的策略,并通过真实世界的反馈而不是人工批准来验证其有效性。
The emergence of adaptive behaviors in ReTool, such as code self-correction, demonstrates exactly the kind of autonomous discovery Silver envisions in experience-based systems. These capabilities weren’t explicitly programmed but emerged through the interaction of reinforcement learning with environmental feedback. The agentic graph architecture provides the structural foundation for these emergent behaviors by creating a framework where knowledge, reasoning, planning, and tools can all evolve based on experience.
ReTool 中自适应行为的出现,例如代码自我纠正,恰恰展示了 Silver 在基于体验的系统中设想的那种自主发现。这些功能没有明确编程,而是通过强化学习与环境反馈的交互而出现的。代理图架构通过创建一个框架,为这些紧急行为提供了结构基础,在该框架中,知识、推理、规划和工具都可以根据经验发展。
From an implementation perspective, a comprehensive cognitive architecture for agentic graph systems in the era of experience would integrate several complementary approaches. The knowledge management layer provided by graph-based retrieval systems preserves relational information and enables continuous knowledge integration. A specialized reasoning engine transforms language models from knowledge memorization to reasoning optimized for working with retrieved graph information. A metacognitive control layer implements mechanisms for strategic resource allocation and self-awareness. And ReTool’s step-wise reinforcement learning methodology enhances this framework by optimizing multi-step reasoning, teaching strategic tool utilization, developing process-focused learning, enabling cross-domain knowledge transfer, and generating synthetic training data.
从实现的角度来看,在体验时代,代理图系统的综合认知架构将集成几种互补的方法。基于图形的检索系统提供的知识管理层保留了关系信息并支持持续的知识集成。专门的推理引擎将语言模型从知识记忆转换为针对检索到的图形信息进行了优化的推理。元认知控制层实现了战略资源分配和自我意识的机制。ReTool 的逐步强化学习方法通过优化多步推理、教授战略工具利用率、开发以过程为中心的学习、实现跨领域知识转移以及生成合成训练数据来增强这一框架。
The integration of these components creates a unified architecture that directly addresses all key requirements for experience-based AI. The combination of graph-based tool orchestration with outcome-driven learning enables systems to generate their own experiences through environmental interaction. Computational grounding through tool execution provides objective feedback not limited by human judgment. Graph-based inference combined with emergent strategic patterns creates reasoning approaches free from human cognitive constraints. Graph structures provide the framework for maintaining and evolving knowledge across extended experiences. And the directed acyclic graph structure for task planning enables the complex multi-step reasoning necessary for sophisticated strategy formation.
这些组件的集成创建了一个统一的架构,直接满足基于体验的 AI 的所有关键要求。基于图形的工具编排与结果驱动型学习相结合,使系统能够通过环境交互生成自己的体验。通过工具执行进行计算基础可提供不受人工判断限制的客观反馈。基于图形的推理与新兴战略模式相结合,创造了不受人类认知约束的推理方法。图形结构为在扩展体验中维护和发展知识提供了框架。用于任务规划的有向无环图结构实现了复杂策略形成所必需的复杂多步骤推理。
The significant efficiency gains observed in both approaches further strengthen their compatibility. ReTool’s 40% reduction in response length through strategic tool use parallels the dramatic efficiency improvements in graph-based systems, which reduce computational requirements by 75–99% through techniques like selective processing and deferred computation. Together, these approaches create systems that can reason effectively within practical resource constraints — addressing the engineering challenges Silver identifies for implementing agentic systems at scale.
在这两种方法中观察到的显著效率提升进一步增强了它们的兼容性。ReTool 通过战略性工具使用将响应长度缩短了 40%,这与基于图的系统的效率显著提高相媲美,该系统通过选择性处理和延迟计算等技术将计算需求降低了 75-99%。这些方法共同创建了可以在实际资源限制内进行有效推理的系统,从而解决了 Silver 确定的大规模实施代理系统的工程挑战。
Perhaps most importantly, this unified approach enables the development of what Silver calls “Move 37 moments” — instances where AI systems discover strategies that humans might never conceive. Just as AlphaGo’s famous move against Lee Sedol demonstrated a strategy beyond human convention, graph-based reasoning enhanced with reinforcement learning can explore solution spaces beyond human cognitive patterns. The graph structure provides the relational framework necessary for discovering novel connections between concepts, while reinforcement learning supplies the mechanism for evaluating and refining these discoveries based on real-world outcomes.
也许最重要的是,这种统一的方法使 Silver 所说的“Move 37 moments”的发展成为可能,即 AI 系统发现人类可能从未想过的策略。正如 AlphaGo 对李世石的著名举动展示了一种超越人类惯例的策略一样,通过强化学习增强的基于图的推理可以探索超越人类认知模式的解决方案空间。图形结构为发现概念之间的新联系提供了必要的关系框架,而强化学习则提供了根据实际结果评估和提炼这些发现的机制。
The implementation path toward this experience-based AI architecture begins with building the foundational graph structures for knowledge, memory, reasoning, and planning. This is followed by integrating reinforcement learning methodologies for strategic tool use and developing the cognitive framework components for metacognition and self-awareness. The system then trains through self-generated experiences with grounded feedback, continuously evolving its capabilities through interaction with its environment.
这种基于体验的 AI 架构的实施路径从构建知识、记忆、推理和规划的基础图形结构开始。随后,整合强化学习方法以使用战略工具,并为元认知和自我意识开发认知框架组件。然后,该系统通过带有扎实反馈的自我生成体验进行训练,通过与环境的交互不断改进其功能。
The integration of agentic graph systems with reinforcement learning for strategic tool use represents not merely an incremental improvement but a fundamental architectural breakthrough that addresses the limitations Silver identifies in current AI approaches. By providing both the structural foundation through graph architecture and the learning mechanism through reinforcement learning, this unified approach enables the transition from systems trained on human data to true agents that generate their own experiences and break beyond human knowledge limitations.
代理图系统与强化学习的集成用于战略工具使用,这不仅代表了一种渐进式改进,而且是一项根本性的架构突破,解决了 Silver 发现的当前 AI 方法中的局限性。通过图架构提供结构基础,通过强化学习提供学习机制,这种统一的方法实现了从基于人类数据训练的系统过渡到真正的代理,这些代理可以生成自己的体验并突破人类知识的限制。
This architecture has profound implications across domains. In scientific research, systems could autonomously conduct experiments, analyze results, and develop new hypotheses without being constrained by current scientific consensus. In education, AI tutors could develop teaching strategies optimized for actual learning outcomes rather than following established pedagogical approaches. In healthcare, systems could discover novel treatment approaches by optimizing for measurable health improvements rather than following standard protocols. And in complex enterprise environments, AI systems could develop strategies for organizational optimization that transcend traditional management theories.
这种架构在各个领域都有深远的影响。在科学研究中,系统可以自主进行实验、分析结果和提出新的假设,而不受当前科学共识的约束。在教育领域,AI 导师可以制定针对实际学习成果优化的教学策略,而不是遵循既定的教学方法。在医疗保健领域,系统可以通过优化可衡量的健康改善而不是遵循标准方案来发现新的治疗方法。在复杂的企业环境中,AI 系统可以制定超越传统管理理论的组织优化策略。
The unified architecture of agentic graph systems enhanced with reinforcement learning creates AI systems that can genuinely learn through their own experiences, discover novel strategies beyond human conception, and continuously evolve their capabilities. It provides the foundation for Silver and Sutton’s vision of the future of artificial intelligence — systems that aren’t limited to replicating human knowledge but can expand human understanding through autonomous discovery. As AI development progresses into this new era, this architectural approach offers not just a theoretical framework but a practical implementation path for building truly agentic systems that learn, reason, and discover in ways that complement and extend human capabilities.
通过强化学习增强的代理图形系统的统一架构创建了 AI 系统,这些系统可以通过自己的经验真正学习,发现超越人类概念的新策略,并不断发展其能力。它为 Silver 和 Sutton 对人工智能未来的愿景奠定了基础,人工智能系统不仅限于复制人类知识,还可以通过自主发现来扩展人类的理解。随着 AI 开发进入这个新时代,这种架构方法不仅提供了一个理论框架,还提供了一个实用的实施路径,用于构建真正的代理系统,这些系统以补充和扩展人类能力的方式学习、推理和发现。
参考文献
https://storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf
https://retool-rl.github.io/
https://www.linkedin.com/feed/update/urn:li:activity:7305879366322843648/
内容中包含的图片若涉及版权问题,请及时与我们联系删除
评论
沙发等你来抢