来自今天的爱可可AI前沿推介

[LG] Adaptive Computation with Elastic Input Sequence

F Xue, V Likhosherstov, A Arnab, N Houlsby, M Dehghani, Y You
[Google Brain]

弹性输入序列的自适应计算

要点:

  1. 提出 AdaTape,一种在神经网络中实现自适应计算的新策略;
  2. 用动态 tape token 实现动态计算和输入序列适应;
  3. 提出自适应 Tape 读取(ATR)算法,以生成动态序列内容和长度;
  4. 与标准 Transformer 和现有的自适应架构 Transformer 相比,在图像识别任务上的性能有所提高,有可能解决具有挑战性的任务。

一句话总结:
AdaTape 是神经网络一种新的自适应计算方法,用动态 tape token 和自适应 Tape 读取算法来生成输入序列,可提高图像识别任务的性能。

摘要:
在解决一个问题时,人在使用的信息类型、采取的流程以及接近和解决该问题所花费的时间方面具有自适应能力。然而,大多数标准的神经网络在不同的样本上具有相同的函数类型和固定的计算预算,而不论其性质和难度如何。自适应性是一个强大的范式,因为它不仅赋予了从业者与这些模型的下游使用有关的灵活性,而且还可以作为解决某些具有挑战性的问题类别的一个强大的归纳偏差。本文提出一种新策略——AdaTape,通过自适应 tape token 在神经网络中实现动态计算。通过给现有的架构配备动态读写 tape,AdaTape 可采用弹性输入序列。用从 tape bank 中获得的 tape token 自适应生成输入序列,这些 tape token 可以是可训练的,也可以从输入数据中生成。本文分析了获得动态序列内容和长度的挑战和要求,并提出了自适应 Tape 阅读器(ATR)算法来实现这两个目标。通过对图像识别任务的广泛实验,表明 AdaTape 可以在保持计算成本的情况下取得更好的性能。

When solving a problem, human beings have the adaptive ability in terms of the type of information they use, the procedure they take, and the amount of time they spend approaching and solving the problem. However, most standard neural networks have the same function type and fixed computation budget on different samples regardless of their nature and difficulty. Adaptivity is a powerful paradigm as it not only imbues practitioners with flexibility pertaining to the downstream usage of these models but can also serve as a powerful inductive bias for solving certain challenging classes of problems. In this work, we propose a new strategy, AdaTape, that enables dynamic computation in neural networks via adaptive tape tokens. AdaTape employs an elastic input sequence by equipping an existing architecture with a dynamic read-and-write tape. Specifically, we adaptively generate input sequences using tape tokens obtained from a tape bank that can either be trainable or generated from input data. We analyze the challenges and requirements to obtain dynamic sequence content and length, and propose the Adaptive Tape Reader (ATR) algorithm to achieve both objectives. Via extensive experiments on image recognition tasks, we show that AdaTape can achieve better performance while maintaining the computational cost.

论文链接:https://arxiv.org/abs/2301.13195
图片
图片
图片
图片

内容中包含的图片若涉及版权问题,请及时与我们联系删除