普林斯顿大学 | 循环Transformers作为可编程计算机

来自今天的爱可可AI前沿推介

[LG] Looped Transformers as Programmable Computers

A Giannou, S Rajput, J Sohn, K Lee, J D. Lee, D Papailiopoulos
[University of Wisconsin-Madison & Princeton University]

循环Transformers作为可编程计算机

要点:

证明了 Transformer 网络可以作为通用计算机进行编程；
用编码器层模拟基本的计算模块，如词法运算、非线性函数、函数调用、程序计数器和条件分支；
将迭代算法映射到可由 Transformer 网络执行的程序中，包括一个基本的计算器、一个基本的线性代数库和一个完整的反向传播、上下文学习算法。

一句话总结:
Transformer 网络可以通过硬编码特定的权重并将其置于一个循环中，从而被编程为通用计算机。

摘要：
本文提出一种将 Transformer 网络作为通用计算机的框架，通过对其进行特定权重的编程并将其置于一个循环中。输入序列作为一个打卡器，由指令和用于数据读/写的存储器组成。本文证明，恒定数量的编码器层可以模拟基本的计算块，包括嵌入编辑操作、非线性函数、函数调用、程序计数器和条件分支。利用这些构件，本文模拟了一个小型指令集计算机。使得能够将迭代算法映射为可由一个循环的13层 Transformer 执行的程序。本文展示了这个 Transformer 如何在其输入的指示下，模拟一个基本的计算器、一个基本的线性代数库，以及采用反向传播的上下文学习算法。本文工作突出了注意力机制的多功能性，并证明了即使是浅层的 Transformer 也能执行成熟的、通用的程序。

We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes. We demonstrate that a constant number of encoder layers can emulate basic computing blocks, including embedding edit operations, non-linear functions, function calls, program counters, and conditional branches. Using these building blocks, we emulate a small instruction-set computer. This allows us to map iterative algorithms to programs that can be executed by a looped, 13-layer transformer. We show how this transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and in-context learning algorithms that employ backpropagation. Our work highlights the versatility of the attention mechanism, and demonstrates that even shallow transformers can execute full-fledged, general-purpose programs.

论文链接：https://arxiv.org/abs/2301.13196

内容中包含的图片若涉及版权问题，请及时与我们联系删除

普林斯顿大学 | 循环Transformers作为可编程计算机

[LG] Looped Transformers as Programmable Computers

评论