爱可可AI前沿推介(12.4)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

1、[LG] Stable-Baselines3: Reliable Reinforcement Learning Implementations

A Raffin, A Hill, M Ernestus, A Gleave, A Kanervisto…

[German Aerospace Center (DLR) & University Paris-Saclay & UC Berkeley & University of Eastern Finland & Kiteswarms GmbH]

Stable-Baselines3: 可靠的强化学习实现。Stable-Baselines3提供了Python中深度强化学习(RL)算法的开源实现。这些实现与参考代码库进行了基准测试，自动单元测试覆盖了95%的代码。这些算法遵循一致的界面，并伴有大量的文档，使得训练和比较不同的RL算法变得简单。

Stable-Baselines3 provides open-source implementations of deep reinforcement learning (RL) algorithms in Python. The implementations have been benchmarked against reference codebases, and automated unit tests cover 95% of the code. The algorithms follow a consistent interface and are accompanied by extensive documentation, making it simple to train and compare different RL algorithms. Our documentation, examples, and source-code are available at https://github.com/DLR-RM/stable-baselines3.

https://weibo.com/1402400261/L4qlWo1Vl

2、[LG] Prior knowledge elicitation: The past, present, and future

P Mikkola, O A. Martin, S Chandramouli, M Hartmann, O A Pla, O Thomas, H Pesonen, J Corander, A Vehtari, S Kaski, P Bürkner, A Klami

[Aalto University & University of Helsinki & University of Oslo & University of Stuttgart & University of Manchester...]

先验知识启发：过去、现在和未来。规范贝叶斯模型的先验分布是数据分析的贝叶斯工作流程的核心部分，但即使对统计专家来说也常常是困难的。先验启发将各领域知识转化为定义明确的先验分布，并在原则上为先验规范问题提供了解决方案。然而，在实践中，离可用的先验启发工具还相当遥远，这些工具可以显著影响在学术界和工业界建立概率模型的方式。我们缺乏能很好地整合到贝叶斯工作流程中的启发方法，并且在时间和精力成本方面有效地进行启发。我们甚至缺乏一个全面的理论框架来理解先验启发问题的不同方面。为什么没有广泛使用先验启发？本文通过识别先验知识启发的一系列关键方面来分析技术现状，从建模任务的属性和先验的性质到与专家的交互形式。现有的先验启发文献在这些方面被回顾和分类。这使得我们能认识到先验启发研究中未被充分研究的方向，最终导致了对改进先验启发方法的几个新途径的建议。

Specification of the prior distribution for a Bayesian model is a central part of the Bayesian workflow for data analysis, but it is often difficult even for statistical experts. Prior elicitation transforms domain knowledge of various kinds into well-defined prior distributions, and offers a solution to the prior specification problem, in principle. In practice, however, we are still fairly far from having usable prior elicitation tools that could significantly influence the way we build probabilistic models in academia and industry. We lack elicitation methods that integrate well into the Bayesian workflow and perform elicitation efficiently in terms of costs of time and effort. We even lack a comprehensive theoretical framework for understanding different facets of the prior elicitation problem. Why are we not widely using prior elicitation? We analyze the state of the art by identifying a range of key aspects of prior knowledge elicitation, from properties of the modelling task and the nature of the priors to the form of interaction with the expert. The existing prior elicitation literature is reviewed and categorized in these terms. This allows recognizing under-studied directions in prior elicitation research, finally leading to a proposal of several new avenues to improve prior elicitation methodology.

https://weibo.com/1402400261/L4qpd3jxm

3、[LG] Deep Reinforcement Learning at the Edge of the Statistical Precipice

R Agarwal, M Schwarzer, P S Castro, A Courville, M G. Bellemare

[Université de Montréal & Google Research]

统计悬崖边缘的深度强化学习。深度强化学习(RL)算法主要通过比较它们在一大套任务上的相对性能来进行评估。大多数已发表的深度强化学习基准结果都是对总体性能的点估计进行比较，如跨任务的平均分和中位分，忽略了用有限数量训练运行所隐含的统计不确定性。从Arcade学习环境(ALE)开始，向计算要求高的基准的转变导致了对每个任务只评估少量运行的做法，加剧了点估计的统计不确定性。本文认为，在少量运行的深度强化学习场景下，可靠的评估不能忽视结果的不确定性，否则就有可能放慢该领域的进展。本文用一个关于Atari 100k基准的案例研究来说明这一点，发现仅从点估计得出的结论与更彻底的统计分析之间存在很大的差异。为提高该领域对少量运行结果的信心，本文主张报告总体性能的区间估计，并提出性能概况，以说明结果的变化性，以及提出更鲁棒、更有效的总体指标，如四分位数平均得分，以实现结果的小不确定性。利用这样的统计工具，仔细检查了现有算法在其他广泛使用的强化学习基准上的性能评估，包括ALE、Procgen和DeepMind控制套件，再次揭示了之前比较中的差异。这些发现要求改变评估深度强化学习性能的方式，本文提出了一个更严格的评估方法，并伴随着一个开源库rliable2，以防止不可靠的结果使该领域停滞。

Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few-run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field’s confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable2, to prevent unreliable results from stagnating the field.

https://weibo.com/1402400261/L4qsS0Ikd

4、[CV] Project Starline: A high-fidelity telepresence system

J Lawrence, DB Goldman, S Achar, GM Blascovich…

[Google Research]

Starline项目：高保真远程呈现系统。本文提出一种实时双向通信系统，让两个相隔甚远的人体验到面对面对话，就像他们共同在场一样。这是第一个明显优于2D视频会议的网真系统，其衡量标准是参与者的评分(如在场感、专注度、反应测量、参与度)、会议回忆以及观察到的非语言行为(如点头、眉毛动作)。这一里程碑是通过在所有设计元素中最大限度地提高视听保真度和共处感来实现的，包括物理布局、照明、面部跟踪、多视角捕捉、麦克风阵列、多流压缩、扬声器输出和光栅显示。该系统实现了关键的3D视听线索(立体感、运动视差和空间化音频)，并实现了全方位的交流线索(眼神接触、手势和身体语言)，但不需要特殊的眼镜或随身携带的麦克风/耳机。该系统包括一个头部跟踪的自动立体显示器、高分辨率3D捕捉和渲染子系统，以及使用压缩的彩色和深度视频流的网络传输。其他的贡献包括一个新的基于图像的几何融合算法、自由空间去噪和通话者定位。

We present a real-time bidirectional communication system that lets two people, separated by distance, experience a face-to-face conversation as if they were copresent. It is the first telepresence system that is demonstrably better than 2D videoconferencing, as measured using participant ratings (e.g., presence, attentiveness, reaction-gauging, engagement), meeting recall, and observed nonverbal behaviors (e.g., head nods, eyebrow movements). This milestone is reached by maximizing audiovisual fidelity and the sense of copresence in all design elements, including physical layout, lighting, face tracking, multi-view capture, microphone array, multi-stream compression, loudspeaker output, and lenticular display. Our system achieves key 3D audiovisual cues (stereopsis, motion parallax, and spatialized audio) and enables the full range of communication cues (eye contact, hand gestures, and body language), yet does not require special glasses or body-worn microphones/headphones. The system consists of a head-tracked autostereoscopic display, high-resolution 3D capture and rendering subsystems, and network transmission using compressed color and depth video streams. Other contributions include a novel image-based geometry fusion algorithm, free-space dereverberation, and talker localization.

https://weibo.com/1402400261/L4qBzi4xG

5、[CV] 3D Photo Stylization: Learning to Generate Stylized Novel Views from a Single Image

F Mu, J Wang, Y Wu, Y Li

[University of Wisconsin-Madison & Snap Research]

3D照片风格化：学习从一张图片中生成风格化新视图。鉴于其在移动摄影和AR/VR中的应用，视觉内容创作已经激起了极大的兴趣。画风转移和单图像3D摄影作为两个代表性的任务，到目前为止是独立发展的。本文将两者联系起来，解决3D照片风格化的挑战性任务——从给定任意风格的单一图像中生成风格化的新视图。关键直觉是，对于这项任务来说，画风迁移和视图合成必须联合建模。为此，提出了一个深度模型，从场景的点云表示中学习几何感知的内容特征来进行风格化，从而产生高质量的风格化图像，在不同视图中保持一致。此外，引入了一种新的训练协议，以实现只使用2D图像的学习。通过广泛的定性和定量研究证明了所提出方法的优越性，并根据从2D图像资产创建3D内容的日益增长的需求，展示了所提出方法的关键应用。

Visual content creation has spurred a soaring interest given its applications in mobile photography and AR / VR. Style transfer and single-image 3D photography as two representative tasks have so far evolved independently. In this paper, we make a connection between the two, and address the challenging task of 3D photo stylization — generating stylized novel views from a single image given an arbitrary style. Our key intuition is that style transfer and view synthesis have to be jointly modeled for this task. To this end, we propose a deep model that learns geometry-aware content features for stylization from a point cloud representation of the scene, resulting in high-quality stylized images that are consistent across views. Further, we introduce a novel training protocol to enable the learning using only 2D images. We demonstrate the superiority of our method via extensive qualitative and quantitative studies, and showcase key applications of our method in light of the growing demand for 3D content creation from 2D image assets. *Work partially done when Fangzhou was an intern at Snap Research †co-corresponding authors

https://weibo.com/1402400261/L4qKrtfV3