爱可可AI前沿推介(2.27)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

1、[LG] FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

J Pathak, S Subramanian, P Harrington...

[NVIDIA & Lawrence Berkeley National Laboratory & University of Michigan & Rice University & California Institute of Technology & Purdue University]

FourCastNet: 基于自适应傅里叶神经算子的数据驱动全球高分辨率气象模型。FourCastNet即傅里叶预测神经网络，是一个数据驱动全球天气预报模型，能以0.25◦的分辨率提供准确的短至中期全球预测结果。FourCastNet准确预测了高分辨率、快速时间尺度的变量，如地表风速、降水和大气水汽。它对规划风能资源、预测极端天气事件(如热带气旋、热带外气旋和大气层河流)有重要意义。FourCastNet与ECMWF综合预报系统(IFS)的预报精度相匹配，IFS是最先进的数值天气预报(NWP)模型，在大尺度变量方面的预报时间很短，而在小尺度变量(包括降水)方面则超过了IFS。FourCastNet在不到2秒的时间内生成一个为期一周的预报，比IFS快了几个数量级。FourCastNet的速度使其能快速、廉价地创建具有数千名集合成分的大型集成预报，以改善概率预报。本文讨论了像FourCastNet这样的数据驱动的深度学习模型是如何成为气象学工具箱的宝贵补充，以帮助和增强NWP模型。

FourCastNet, short for Fourier ForeCasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at 0.25◦ resolution. FourCastNet accurately forecasts high-resolution, fast-timescale variables such as the surface wind speed, precipitation, and atmospheric water vapor. It has important implications for planning wind energy resources, predicting extreme weather events such as tropical cyclones, extra-tropical cyclones, and atmospheric rivers. FourCastNet matches the forecasting accuracy of the ECMWF Integrated Forecasting System (IFS), a state-of-the-art Numerical Weather Prediction (NWP) model, at short lead times for large-scale variables, while outperforming IFS for small-scale variables, including precipitation. FourCastNet generates a week-long forecast in less than 2 seconds, orders of magnitude faster than IFS. The speed of FourCastNet enables the creation of rapid and inexpensive large-ensemble forecasts with thousands of ensemble-members for improving probabilistic forecasting. We discuss how data-driven deep learning models such as FourCastNet are a valuable addition to the meteorology toolkit to aid and augment NWP models.

2、[LG] Flow-based sampling in the lattice Schwinger model at criticality

M S. Albergo, D Boyda, K Cranmer, D C. Hackett, G Kanwar, S Racanière, D J. Rezende, F Romero-López, P E. Shanahan, J M. Urban

[New York University & Argonne National Laboratory & MIT & DeepMind]

临界状态晶格Schwinger模型流采样。最近的结果表明，基于流的算法可以为晶格场理论的应用提供有效的场分布采样，如量子色动力学和Schwinger模型的研究。本文提供了在Schwinger模型中，在费米子质量的临界值下，基于流的鲁棒采样的数值演示。在相同的参数下，传统方法无法对配置空间的所有部分进行采样，导致严重低估了不确定性。

Recent results suggest that flow-based algorithms may provide efficient sampling of field distributions for lattice field theory applications, such as studies of quantum chromodynamics and the Schwinger model. In this work, we provide a numerical demonstration of robust flow-based sampling in the Schwinger model at the critical value of the fermion mass. In contrast, at the same parameters, conventional methods fail to sample all parts of configuration space, leading to severely underestimated uncertainties.

3、[LG] Foundations of Structural Causal Models with Cycles and Latent Variables

S Bongers, P Forré, J Peters, J M. Mooij

[University of Amsterdam & University of Copenhagen & University of Amsterdam]

具有周期及潜变量的结构因果模型基础。结构因果模型(SCM)，也被称为(非参数化)结构方程模型(SEM)，被广泛用于因果建模。特别是，非周期性SCM，也称为递归SEM，构成了一个被充分研究的SCM子类，推广了因果贝叶斯网络，允许潜在的混杂因素。本文在一个更普遍的环境中研究了SCM，允许潜混杂因子和周期的存在。在周期存在的情况下，非循环SCM的许多便利属性一般不成立：它们并不总是有解；并不总是得出独特的观察、干预和反事实分布；边际化并不总是存在，如果存在，边际模型并不总是符合潜投影；它们并不总是满足马尔可夫属性；它们的图并不总是与它们的因果语义一致。对于 SCM，通常这些属性中的每一个在某些可解性条件下都成立。本文工作推广了周期SCM的结果，到目前为止，这些结果只在某些特殊情况下才知道。提出了一类简单SCM，将非周期SCM类扩展到了周期性的环境中，同时保留了非周期SCM的许多便利特性。本文旨在为用SCM进行统计因果建模的一般理论打下基础。

Structural causal models (SCMs), also known as (nonparametric) structural equation models (SEMs), are widely used for causal modeling purposes. In particular, acyclic SCMs, also known as recursive SEMs, form a wellstudied subclass of SCMs that generalize causal Bayesian networks to allow for latent confounders. In this paper, we investigate SCMs in a more general setting, allowing for the presence of both latent confounders and cycles. We show that in the presence of cycles, many of the convenient properties of acyclic SCMs do not hold in general: they do not always have a solution; they do not always induce unique observational, interventional and counterfactual distributions; a marginalization does not always exist, and if it exists the marginal model does not always respect the latent projection; they do not always satisfy a Markov property; and their graphs are not always consistent with their causal semantics. We prove that for SCMs in general each of these properties does hold under certain solvability conditions. Our work generalizes results for SCMs with cycles that were only known for certain special cases so far. We introduce the class of simple SCMs that extends the class of acyclic SCMs to the cyclic setting, while preserving many of the convenient properties of acyclic SCMs. With this paper we aim to provide the foundations for a general theory of statistical causal modeling with SCMs.

4、[LG] Transformer Quality in Linear Time

W Hua, Z Dai, H Liu, Q V. Le

[Cornell University & Google Research]

线性时间Transformer质量。本文重新审视了Transformer的设计选择，并提出了解决其处理长序列的弱点的方法。提出了一个简单的层，门控注意力单元，允许用质量损失最小化的较弱的单头注意力。提出一个补充这个新层的线性近似方法，对加速器友好，并且在质量上具有高度竞争力。由此产生的模型FLASH2，在短(512)和长(8K)的上下文长度上与改进的Transformer困惑度相匹配，在Wiki-40B上实现了4.9倍的训练速度，在PG-19上实现了12.1倍的自回归语言建模，在C4上实现了4.8倍的掩码语言建模。

We revisit the design choices in Transformers, and propose methods to address their weaknesses in handling long sequences. First, we propose a simple layer named gated attention unit, which allows the use of a weaker single-head attention with minimal quality loss. We then propose a linear approximation method complementary to this new layer, which is accelerator-friendly and highly competitive in quality. The resulting model, named FLASH2, matches the perplexity of improved Transformers over both short (512) and long (8K) context lengths, achieving training speedups of up to 4.9× on Wiki-40B and 12.1× on PG-19 for auto-regressive language modeling, and 4.8× on C4 for masked language modeling.

5、[CV] GroupViT: Semantic Segmentation Emerges from Text Supervision

J Xu, S D Mello, S Liu, W Byeon, T Breuel, J Kautz, X Wang

[UC San Diego & NVIDIA]

GroupViT：从文本监督涌现的语义分割。分组和识别是视觉场景理解的重要组成部分，例如，用于目标检测和语义分割。在端到端深度学习系统中，图像区域分组通常是通过像素级识别标签的自上而下的监督而隐式发生的。本文建议将分组机制带回深度网络中，使得语义分割可以在只有文本监督的情况下自动涌现。提出一种分层的分组视觉Transformer(GroupViT)，超越了常规的网格结构表示，学会将图像区域分组为逐渐变大的任意形状的片段。通过对比损失，在一个大规模的图像-文本数据集上与文本编码器共同训练GroupViT。在只有文本监督和没有任何像素级标注的情况下，GroupViT学会了将语义区域组合在一起，并成功地以零样本方式迁移到语义分割任务上，不需要任何进一步的微调。GroupViT在PASCAL VOC 2012和PASCAL Context数据集上实现了51.2% mIoU和22.3% mIoU的零样本精度，其表现与需要更高水平监督的最先进的迁移学习方法相比具有竞争力。

Grouping and recognition are important components of visual scene understanding, e.g., for object detection and semantic segmentation. With end-to-end deep learning systems, grouping of image regions usually happens implicitly via top-down supervision from pixel-level recognition labels. Instead, in this paper, we propose to bring back the grouping mechanism into deep networks, which allows semantic segments to emerge automatically with only text supervision. We propose a hierarchical Grouping Vision Transformer (GroupViT), which goes beyond the regular grid structure representation and learns to group image regions into progressively larger arbitraryshaped segments. We train GroupViT jointly with a text encoder on a large-scale image-text dataset via contrastive losses. With only text supervision and without any pixellevel annotations, GroupViT learns to group together semantic regions and successfully transfers to the task of semantic segmentation in a zero-shot manner, i.e., without any further fine-tuning. It achieves a zero-shot accuracy of 51.2% mIoU on the PASCAL VOC 2012 and 22.3% mIoU on PASCAL Context datasets, and performs competitively to state-of-the-art transfer-learning methods requiring greater levels of supervision. Project page is available at https://jerryxu.net/GroupViT.