Approximate Thompson Sampling via Epistemic Neural Networks

I Osband, Z Wen, S M Asghari, V Dwaracherla, M Ibrahimi, X Lu, B V Roy
[DeepMind]

基于认知神经网络的汤普森采样近似

要点:

  1. 在使用神经网络建模的复杂环境中,汤普森采样(TS)可能变得难以计算。
  2. 认知神经网络(ENN)被设计用来产生准确的联合预测分布,这是有效的行动选择所需要的;
  3. 计算实验表明,ENN在联合预测和决策问题上表现更好,epinet 架构以更低的计算成本匹配或优于现有方法。
  4. 这项工作为未来研究有效的 ENN 架构以在大型深度学习系统中进行更好的决策奠定了基础。

一句话总结:
认知神经网络(ENN)可有效地近似汤普森抽样(TS),计算成本更低,联合预测分布更好。

Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling from a posterior distribution. Unfortunately, this can become computationally intractable in complex environments, such as those modeled using neural networks. Approximate posterior samples can produce effective actions, but only if they reasonably approximate joint predictive distributions of outputs across inputs. Notably, accuracy of marginal predictive distributions does not suffice. Epistemic neural networks (ENNs) are designed to produce accurate joint predictive distributions. We compare a range of ENNs through computational experiments that assess their performance in approximating TS across bandit and reinforcement learning environments. The results indicate that ENNs serve this purpose well and illustrate how the quality of joint predictive distributions drives performance. Further, we demonstrate that the extit{epinet} -- a small additive network that estimates uncertainty -- matches the performance of large ensembles at orders of magnitude lower computational cost. This enables effective application of TS with computation that scales gracefully to complex environments.

论文链接:https://arxiv.org/abs/2302.09205


图片
图片
图片
图片

内容中包含的图片若涉及版权问题,请及时与我们联系删除