爱可可AI前沿推介(11.25)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可生活

1、[CV] A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition

J Tian, D Yung, Y Hsu, Z Kira

[Georgia Institute of Technology & Samsung Research America]

基于敏感度分解的神经校准几何视角。众所周知，视觉分类模型在面对数据分布变化时，会受到不良校准的影响。本文对该问题采取一种几何学的方法，提出了几何敏感性分解(GSD)，将样本特征嵌入的范数和与目标分类器的角度相似性分解为一个依赖实例的部分和一个不依赖实例的部分。依赖于实例的部分捕获了关于输入变化的敏感信息，而独立于实例的部分则代表了不敏感的信息，仅用于最小化训练数据集的损失。受该分解的启发，从分析上推导出一个对当前softmax-linear模型的简单扩展，该模型在训练过程中学习解缠这两部分。在几个常见的视觉模型上，面对分布外(OOD)数据和损坏数据，解缠模型在标准校准指标上优于其他校准方法，复杂程度明显降低，超越了目前的最先进水平，在损坏的CIFAR100上，预期校准误差相对提高了30.8%。

It is well known that vision classification models suffer from poor calibration in the face of data distribution shifts. In this paper, we take a geometric approach to this problem. We propose Geometric Sensitivity Decomposition (GSD) which decomposes the norm of a sample feature embedding and the angular similarity to a target classifier into an instance-dependent and an instance-independent component. The instance-dependent component captures the sensitive information about changes in the input while the instance-independent component represents the insensitive information serving solely to minimize the loss on the training dataset. Inspired by the decomposition, we analytically derive a simple extension to current softmax-linear models, which learns to disentangle the two components during training. On several common vision models, the disentangled model outperforms other calibration methods on standard calibration metrics in the face of out-of-distribution (OOD) data and corruption with significantly less complexity. Specifically, we surpass the current state of the art by 30.8% relative improvement on corrupted CIFAR100 in Expected Calibration Error.

https://weibo.com/1402400261/L33yPDSWV

2、[AI] AI in Games: Techniques, Challenges and Opportunities

Q Yin, J Yang, W Ni, B Liang, K Huang

[Chinese Academy of Science & Tsinghua University]

游戏AI综述：技术、挑战和机遇。随着AlphaGo的突破，人机游戏中的AI已经成为一个非常热门的话题，吸引了世界各地的研究人员，这通常是测试人工智能的一个有效标准。各种游戏人工智能(AI)系统已经被开发出来，如Libratus、OpenAI Five和AlphaStar，击败了专业的人类选手。本文调研了最近成功的游戏AI，包括棋类游戏AI、纸牌游戏AI、第一人称射击游戏AI和实时战略游戏AI。通过本综述，1）比较了不同种类的游戏在智能决策领域的主要困难；2）说明了开发专业级AI的主流框架和技术；3）提出了目前智能决策AI的挑战或缺点；4）试图提出游戏和智能决策技术的未来趋势。希望这篇简短的综述能为初学者提供一个介绍，为游戏AI研究者提供启发。

With breakthrough of AlphaGo, AI in human-computer game has become a very hot topic attracting researchers all around the world, which usually serves as an effective standard for testing artificial intelligence. Various game AI systems (AIs) have been developed such as Libratus, OpenAI Five and AlphaStar, beating professional human players. In this paper, we survey recent successful game AIs, covering board game AIs, card game AIs, first-person shooting game AIs and real time strategy game AIs. Through this survey, we 1) compare the main difficulties among different kinds of games for the intelligent decision making field ; 2) illustrate the mainstream frameworks and techniques for developing professional level AIs; 3) raise the challenges or drawbacks in the current AIs for intelligent decision making; and 4) try to propose future trends in the games and intelligent decision making techniques. Finally, we hope this brief review can provide an introduction for beginners, inspire insights for researchers in the filed of AI in games.

https://weibo.com/1402400261/L33Dto7Rn

3、[CV] Fine-Grained Image Analysis with Deep Learning: A Survey

X Wei, Y Song, O M Aodha, J Wu, Y Peng, J Tang, J Yang, S Belongie

[Nanjing University of Science and Technology & University of Edinburgh & Peking University]

深度学习细粒度图像分析综述。细粒度图像分析(FGIA)是计算机视觉和模式识别中一个长期存在的基本问题，支撑着现实世界的各种应用。细粒度图像分析任务是分析从属类别的视觉对象，例如鸟类的种类或汽车的模型。细粒度图像分析所固有的小的类间差异和大的类内差异使其成为一个具有挑战性的问题。利用深度学习的进展，近年来深度学习驱动的FGIA取得了显著进展。本文对这些进展进行了系统性调研，试图通过整合两个基本的细粒度研究领域——细粒度图像识别和细粒度图像检索，重新定义并拓宽细粒度图像分析领域。回顾了FGIA的其他关键问题，如公开可用的基准数据集和相关领域的应用。强调了几个研究方向和需要社区进一步探索的开放性问题。

Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, e.g., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it a challenging problem. Capitalizing on advances in deep learning, in recent years we have witnessed remarkable progress in deep learning powered FGIA. In this paper we present a systematic survey of these advances, where we attempt to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas – fine-grained image recognition and fine-grained image retrieval. In addition, we also review other key issues of FGIA, such as publicly available benchmark datasets and related domain-specific applications. We conclude by highlighting several research directions and open problems which need further exploration from the community.

https://weibo.com/1402400261/L33HRBlNa

4、[LG] Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception

J Dapello, J Feather, H Le, T Marques, D D. Cox, J H. McDermott, J J. DiCarlo, S Chung

[MIT & MIT-IBM Watson AI Lab]

神经群体几何学揭示了随机性在鲁棒感知中的作用。神经科学家和机器学习研究人员经常引用对抗样本，作为计算模型与生物感觉系统的分歧。最近的工作提出在视觉神经网络中加入受生物学启发的组件，作为提高其对抗性鲁棒性的一种方式。减少对抗性弱点的一个令人惊讶的有效组成部分是响应的随机性，就像生物神经元所表现出来的那样。利用最近从计算神经科学中开发的几何技术，本文研究了对抗性扰动如何影响标准、对抗性训练和生物启发的随机网络的内部表示。发现每种类型的网络都有不同的几何特征，揭示了实现鲁棒表示的不同机制。将这些结果推广到听觉领域，表明神经的随机性也使听觉模型对对抗性扰动更加鲁棒。随机网络的几何分析揭示了干净刺激和对抗性扰动刺激的表示之间的重叠，并定量地证明了随机性的竞争性几何效应在对抗性和干净性能之间的权衡中起作用。结果阐明了对抗性训练和随机网络所利用的鲁棒感知的策略，有助于解释随机性如何有利于机器和生物计算。

Adversarial examples are often cited by neuroscientists and machine learning researchers as an example of how computational models diverge from biological sensory systems. Recent work has proposed adding biologically-inspired components to visual neural networks as a way to improve their adversarial robustness. One surprisingly effective component for reducing adversarial vulnerability is response stochasticity, like that exhibited by biological neurons. Here, using recently developed geometrical techniques from computational neuroscience, we investigate how adversarial perturbations influence the internal representations of standard, adversarially trained, and biologically-inspired stochastic networks. We find distinct geometric signatures for each type of network, revealing different mechanisms for achieving robust representations. Next, we generalize these results to the auditory domain, showing that neural stochasticity also makes auditory models more robust to adversarial perturbations. Geometric analysis of the stochastic networks reveals overlap between representations of clean and adversarially perturbed stimuli, and quantitatively demonstrates that competing geometric effects of stochasticity mediate a tradeoff between adversarial and clean performance. Our results shed light on the strategies of robust perception utilized by adversarially trained and stochastic networks, and help explain how stochasticity may be beneficial to machine and biological computation.1

https://weibo.com/1402400261/L33KJaM4W

5、[LG] DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

A Tamkin, V Liu, R Lu, D Fein, C Schultz, N Goodman

[Stanford University]

DABS: 自监督学习领域无关基准。自监督学习算法，包括BERT和SimCLR，已经在自然语言处理、计算机视觉和语音处理等领域取得了重大进展。然而，这些算法是特定领域的，意味着必须为每一个新环境开发新的自监督学习算法，包括无数的医疗、科学和多模态领域。为了促进领域诊断方法的发展，本文提出了DABS：一个自监督学习的领域诊断基准。为了在DABS上表现良好，算法要在七个不同的领域进行评估：自然图像、多通道传感器数据、英文文本、语音记录、多语言文本、胸部X光片和带有文本描述的图像。每个领域都包含一个用于预训练的无标签数据集；根据模型在该领域的一组有标签的任务上的下游表现来评分。还提出了e-Mix和ShED：两个领域无关的基线算法；其相对温和的表现表明，在自监督学习成为任意领域的开箱即用的解决方案之前，需要取得重大进展。

Self-supervised learning algorithms, including BERT and SimCLR, have enabled significant strides in fields like natural language processing, computer vision, and speech processing. However, these algorithms are domain-specific, meaning that new self-supervised learning algorithms must be developed for each new setting, including myriad healthcare, scientific, and multimodal domains. To catalyze progress toward domain-agnostic methods, we introduce DABS: a DomainAgnostic Benchmark for Self-supervised learning. To perform well on DABS, an algorithm is evaluated on seven diverse domains: natural images, multichannel sensor data, English text, speech recordings, multilingual text, chest x-rays, and images with text descriptions. Each domain contains an unlabeled dataset for pretraining; the model is then is scored based on its downstream performance on a set of labeled tasks in the domain. We also present e-Mix and ShED: two baseline domain-agnostic algorithms; their relatively modest performance demonstrates that significant progress is needed before self-supervised learning is an out-of-thebox solution for arbitrary domains. Code for benchmark datasets and baseline algorithms is available at https://github.com/alextamkin/dabs.

https://weibo.com/1402400261/L33O4j6M1

另外几篇值得关注的论文：

[IR] Scaling Law for Recommendation Models: Towards General-purpose User Representations

推荐模型缩放律：通用用户表示研究

K Shin, H Kwak, K Kim, S Y Kim, M N Ramstrom

[NAVER CLOVA]

https://weibo.com/1402400261/L33RSmKSZ

[CV] CytoImageNet: A large-scale pretraining dataset for bioimage transfer learning

CytoImageNet：大规模生物图像迁移学习预训练数据集

S B Z. Hua, A X. Lu, A M. Moses

[University of Toronto & Microsoft Research]

https://weibo.com/1402400261/L33TG6QIr

[CV] Benchmarking Detection Transfer Learning with Vision Transformers

基于视觉Transformer的检测迁移学习基准测试

Y Li, S Xie, X Chen, P Dollar, K He, R Girshick

[Facebook AI Research (FAIR)]

https://weibo.com/1402400261/L33Vv0Gda

[CV] Bag of Tricks and A Strong baseline for Image Copy Detection

图像复制检测技巧与强大基线

W Wang, W Zhang, Y Sun, Y Yang

[Baidu Research]

https://weibo.com/1402400261/L33XEzhaZ

内容中包含的图片若涉及版权问题，请及时与我们联系删除