谷歌 | 记忆增强大型语言模型在计算上是通用的

来自今天的爱可可AI前沿推介

[CL] Memory Augmented Large Language Models are Computationally Universal

D Schuurmans
[Google Brain & University of Alberta]

记忆增强大型语言模型在计算上是通用的

要点:

当使用外部记忆增强时，基于 Transformer 的大型语言模型是计算上通用的，意味着它们可以处理任意大的输入并模拟任何算法；
所用的特定大型语言模型 Flan-U-PaLM 540B 可以与关联读写记忆结合，以精确模拟通用图灵机的执行；
其构建仅依赖于设计一种存储指令计算机，将语言模型连接到关联记忆，而不需要修改语言模型权重；
语言模型和记忆之间的解析步骤，通过简单的正则表达式匹配(即有限自动机)完成。

一句话总结:
证明了在基于 Transformer 的大型语言模型中加入外部记忆，通过模拟通用图灵机，能使其实现计算上的通用性，这是通过将语言模型与关联记忆相连的存储指令计算机实现的，不需要修改语言模型权重。

摘要：
本文表明，当使用外部记忆增强时，基于 Transformer 的大型语言模型在计算上是通用的。任何以有界长度字符串为条件的确定性语言模型就等价于有限自动机，因此计算上是有限的。然而，使用读写记忆增强此类模型可以产生处理任意大输入的可能性，并可能模拟任何算法。本文建立了现有的大型语言模型 Flan-U-PaLM 540B 可以与关联读写记忆相结合，以精确模拟通用图灵机U15,2的执行。该发现的一个关键方面是，不需要对语言模型权重进行任何修改。其构建完全依赖于设计一种存储指令计算机的形式，该计算机随后可以用一组特定的提示进行编程。

We show that transformer-based large language models are computationally universal when augmented with an external memory. Any deterministic language model that conditions on strings of bounded length is equivalent to a finite automaton, hence computationally limited. However, augmenting such models with a read-write memory creates the possibility of processing arbitrarily large inputs and, potentially, simulating any algorithm. We establish that an existing large language model, Flan-U-PaLM 540B, can be combined with an associative read-write memory to exactly simulate the execution of a universal Turing machine, U15,2. A key aspect of the finding is that it does not require any modification of the language model weights. Instead, the construction relies solely on designing a form of stored instruction computer that can subsequently be programmed with a specific set of prompts.

论文链接：https://arxiv.org/abs/2301.04589

内容中包含的图片若涉及版权问题，请及时与我们联系删除

谷歌 | 记忆增强大型语言模型在计算上是通用的

[CL] Memory Augmented Large Language Models are Computationally Universal

评论