爱可可AI前沿推介(3.6)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

1、[LG] Do We Really Need Deep Learning Models for Time Series Forecasting?

S Elsayed, D Thyssens, A Rashed, H S Jomaa, L Schmidt-Thieme

[University of Hildesheim]

时间序列预测是否有必要用深度学习模型。时间序列预测是机器学习的一项重要任务，因为它有广泛的应用，包括但不限于预测电力消耗、交通和空气质量。传统的预测模型依赖于滚动平均数、矢量自回归和自回归综合移动平均数。另一方面，最近提出了深度学习和矩阵分解模型来解决同样的问题，并具有更有竞争力的性能。然而，此类模型的一个主要缺点是，与传统技术相比，它们往往过于复杂。本文报告了主流深度学习模型相对于一个著名的机器学习基线梯度提升回归树(GBRT)模型的结果。与深度神经网络(DNN)模型类似，将时间序列预测任务转化为一个基于窗口的回归问题。此外，对GBRT模型的输入和输出结构进行了特征设计，例如，对于每个训练窗口，目标值与外部特征串接，然后扁平化，形成多输出GBRT模型的一个输入实例。对过去几年在顶级会议上发表的8个最先进的深度学习模型在9个数据集上进行了比较研究。结果表明，基于窗口的输入转换将一个简单的GBRT模型的性能提高到超过本文所评估的所有最先进的DNN模型的水平。在更广泛的范围内，这些研究结果表明，不应否定较简单的机器学习基线，而应更谨慎地配置，以确保时间序列预测领域进展的真实性。

Time series forecasting is a crucial task in machine learning, as it has a wide range of applications including but not limited to forecasting electricity consumption, traffic, and air quality. Traditional forecasting models rely on rolling averages, vector auto-regression and auto-regressive integrated moving averages. On the other hand, deep learning and matrix factorization models have been recently proposed to tackle the same problem with more competitive performance. However, one major drawback of such models is that they tend to be overly complex in comparison to traditional techniques. In this paper, we report the results of prominent deep learning models with respect to a well-known machine learning baseline, a Gradient Boosting Regression Tree (GBRT) model. Similar to the deep neural network (DNN) models, we transform the time series forecasting task into a window-based regression problem. Furthermore, we feature-engineered the input and output structure of the GBRT model, such that, for each training window, the target values are concatenated with external features, and then flattened to form one input instance for a multi-output GBRT model. We conducted a comparative study on nine datasets for eight state-of-the-art deep-learning models that were presented at top-level conferences in the last years. The results demonstrate that the window-based input transformation boosts the performance of a simple GBRT model to levels that outperform all state-of-the-art DNN models evaluated in this paper. Keywords— Time Series Forecasting, Deep Learning, Boosting Regression Trees.

2、[CV] BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning

Z Hou, B Yu, D Tao

[The University of Sydney & JD Explore Academy]

BatchFormer: 鲁棒表示学习样本关系探索学习。尽管深度神经网络取得了成功，但由于数据的稀缺性问题，如数据不平衡、未见分布和领域漂移，深度表示学习仍存在许多挑战。为解决上述问题，人们设计了多种方法，以vanilla方式(即从输入或损失函数角度)探索样本关系，无法探索深度神经网络内部结构，进行样本关系学习。受此启发，本文建议让深度神经网络本身有能力从每个minibatch中学习样本关系。提出一种批次transformer模块BatchFormer，将其应用于每个minibatch的批次维度，在训练过程中隐式探索样本关系。所提出的方法实现了不同样本的协作，例如，头部类样本也可以为长尾识别的尾部类学习做出贡献。为缓解训练和测试之间的差距，在训练期间与或不与BatchFormer共享分类器，因此在测试期间可以删除。在十多个数据集上进行了广泛的实验，所提出的方法在不同的数据稀缺应用上取得了明显的改进，包括长尾识别、组合式零样本学习、域泛化和对比学习等任务。

Despite the success of deep neural networks, there are still many challenges in deep representation learning due to the data scarcity issues such as data imbalance, unseen distribution, and domain shift. To address the abovementioned issues, a variety of methods have been devised to explore the sample relationships in a vanilla way (i.e., from the perspectives of either the input or the loss function), failing to explore the internal structure of deep neural networks for learning with sample relationships. Inspired by this, we propose to enable deep neural networks themselves with the ability to learn the sample relationships from each minibatch. Specifically, we introduce a batch transformer module or BatchFormer, which is then applied into the batch dimension of each mini-batch to implicitly explore sample relationships during training. By doing this, the proposed method enables the collaboration of different samples, e.g., the head-class samples can also contribute to the learning of the tail classes for long-tailed recognition. Furthermore, to mitigate the gap between training and testing, we share the classifier between with or without the BatchFormer during training, which can thus be removed during testing. We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications without any bells and whistles, including the tasks of long-tailed recognition, compositional zero-shot learning, domain generalization, and contrastive learning. Code will be made publicly available at https://github.com/zhihou7/BatchFormer.

3、[LG] Biological error correction codes generate fault-tolerant neural networks

A Zlokapa, A K. Tan, J M. Martyn, M Tegmark, I L. Chuang

[MIT]

基于生物纠错码的容错神经网络。在深度学习中，容错计算是否可能，一直是一个开放问题：只用不可靠的神经元能否实现任意可靠的计算？在哺乳动物皮层中，已经观察到被称为网格码的模拟纠错码，以保护状态免受神经尖峰噪声影响，但它们在信息处理中的作用尚不清楚。本文用这些生物编码表明，如果每个神经元的缺陷位于一个尖锐的阈值以下，就可以实现一个通用的容错神经网络，这个阈值在数量级上与生物神经元中观察到的噪声相吻合。从有缺陷到容错的神经计算的尖锐相变的发现，为理解人工智能和神经科学中的噪声模拟系统开辟了一条道路。

It has been an open question in deep learning if fault-tolerant computation is possible: can arbitrarily reliable computation be achieved using only unreliable neurons? In the mammalian cortex, analog error correction codes known as grid codes have been observed to protect states against neural spiking noise, but their role in information processing is unclear. Here, we use these biological codes to show that a universal fault-tolerant neural network can be achieved if the faultiness of each neuron lies below a sharp threshold, which we find coincides in order of magnitude with noise observed in biological neurons. The discovery of a sharp phase transition from faulty to fault-tolerant neural computation opens a path towards understanding noisy analog systems in artificial intelligence and neuroscience.

4、[CV] ERF: Explicit Radiance Field Reconstruction From Scratch

S Aroudj, S Lovegrove, E Ilg, T Schmidt, M Goesele, R Newcombe

[Reality Labs (RL)]

ERF：从头开始显式辐射场重建。本文提出一种新的显式稠密3D重建方法，该方法处理一组带有传感器位置和校准的场景图像，并估计出一个照片级逼真的数字模型。其中一个关键的创新是，与基于神经网络的(隐式)替代方案相比，底层的体表示是完全显式的。用清晰易懂的优化变量与场景几何及其出射表面辐射度的映射来显式编码场景。用存储在稀疏体素八叉树中的分层体场来表示它们。仅从注册的场景图像中用数百万的未知变量鲁棒地重建这样一个体场景模型，是一个高度非凸和复杂的优化问题。为此，采用随机梯度下降法(Adam)，该方法由一个逆可微渲染器引导。该方法可重建高质量模型，与最先进的隐式方法相媲美。重要的是，该方法没有使用一个顺序的重建管道，此类管道各步骤都受到来自前几个阶段的不完整或不可靠的信息的影响，而是从统一的初始解决方案开始优化，场景的几何和辐射度都与真实值相差甚远。所提出方法是通用和实用的，不需要严格控制的实验室设置来进行捕捉，允许重建具有大量物体的场景，包括具有挑战性的物体，如室外植物或毛茸玩具。由于其明确的设计，重建的场景模型是通用的，可以被交互式地编辑，这对于隐性的替代方案来说计算成本太高。

We propose a novel explicit dense 3D reconstruction approach that processes a set of images of a scene with sensor poses and calibrations and estimates a photo-real digital model. One of the key innovations is that the underlying volumetric representation is completely explicit in contrast to neural networkbased (implicit) alternatives. We encode scenes explicitly using clear and understandable mappings of optimization variables to scene geometry and their outgoing surface radiance. We represent them using hierarchical volumetric fields stored in a sparse voxel octree. Robustly reconstructing such a volumetric scene model with millions of unknown variables from registered scene images only is a highly non-convex and complex optimization problem. To this end, we employ stochastic gradient descent (Adam) which is steered by an inverse differentiable renderer. We demonstrate that our method can reconstruct models of high quality that are comparable to state-of-the-art implicit methods. Importantly, we do not use a sequential reconstruction pipeline where individual steps suffer from incomplete or unreliable information from previous stages, but start our optimizations from uniformed initial solutions with scene geometry and radiance that is far off from the ground truth. We show that our method is general and practical. It does not require a highly controlled lab setup for capturing, but allows for reconstructing scenes with a vast variety of objects, including challenging ones, such as outdoor plants or furry toys. Finally, our reconstructed scene models are versatile thanks to their explicit design. They can be edited interactively which is computationally too costly for implicit alternatives.

5、[CV] Playable Environments: Video Manipulation in Space and Time

W Menapace, A Siarohin, C Theobalt, V Golyanik, S Tulyakov, S Lathuilière, E Ricci

[University of Trento & MPI for Informatics & Snap Inc]

可玩环境：视频的空间和时间操纵。本文提出了"可玩环境"——一种在空间和时间上进行交互式视频生成和操纵的新表示。通过推理时的单幅图像，新框架允许用户在生成视频时通过提供一系列所需动作来移动3D物体，基于NeRF的编码器-解码器架构和一个动作模块。这些动作是以一种无监督方式学习的。相机可以控制以获得所需的视角。该方法为每一帧建立一个环境状态，可以被提出的行动模块操纵，通过体渲染解码回图像空间。为了支持物体的不同外观，用基于风格的调制方法扩展了神经辐射场。所提出方法在各种单目视频集合上进行训练，只需要估计相机参数和2D物体位置。为了设定一个具有挑战性的基准，引入了两个具有显著相机运动的大规模视频数据集。正如实验所证明的，可玩环境能实现一些创造性的应用，这是之前的视频合成所不能达到的，包括可播放的3D视频生成、风格化和操纵。

We present Playable Environments—a new representation for interactive video generation and manipulation in space and time. With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a video by providing a sequence of desired actions. The actions are learnt in an unsupervised manner. The camera can be controlled to get the desired viewpoint. Our method builds an environment state for each frame, which can be manipulated by our proposed action module and decoded back to the image space with volumetric rendering. To support diverse appearances of objects, we extend neural radiance fields with style-based modulation. Our method trains on a collection of various monocular videos requiring only the estimated camera parameters and 2D object locations. To set a challenging benchmark, we introduce two large scale video datasets with significant camera movements. As evidenced by our experiments, playable environments enable several creative applications not attainable by prior video synthesis works, including playable 3D video generation, stylization and manipulation1.