爱可可AI前沿推介(2.22)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

1、[AI] Axiomatizing consciousness, with applications

H Barendregt, A Raffone

[Radboud University & Sapienza University]

意识的公理化与应用。意识将被公理化地引入，受到佛教内观修行和心理学、计算机科学中的逻辑学以及认知神经科学的启发，作为由复合的、离散的和(非决定性的)可计算的配置流组成。在该背景下，自我、专注、正念和各种形式的痛苦等概念可以被定义。作为此设定的一种应用，本文展示了专注和正念的综合发展如何能减轻并最终消除一些形式的痛苦。

Consciousness will be introduced axiomatically, inspired by Buddhist insight meditation and psychology, logic in computer science, and cognitive neuroscience, as consisting of a stream of configurations that is compound, discrete, and (non-deterministically) computable. Within this context the notions of self, concentration, mindfulness, and various forms of suffering can be defined. As an application of this set up, it will be shown how a combined development of concentration and mindfulness can attenuate and eventually eradicate some of the forms of suffering.

2、[CL] SGPT: GPT Sentence Embeddings for Semantic Search

N Muennighoff

[Peking University]

SGPT: 面向语义搜索的GPT句嵌入。GPT Transformer是目前最大的语言模型，但语义搜索却被BERT Transformer所主导。本文提出SGPT-BE和SGPT-CE，用于将GPT模型作为Bi-Encoders或Cross-Encoders用于对称或不对称搜索。SGPT-BE通过仅对偏置张量的对比微调和一种新的池化方法产生有语义的句嵌入。58亿参数的SGPT-BE比现有的最好的句嵌入要好6%，在BEIR上创造了新的最先进水平，优于同时提出的微调了25万倍参数的175B Davinci端OpenAI Embeddings。SGPT-CE用GPT模型的对数概率，没有任何微调。61亿参数的SGPT-CE在BEIR上创造了无监督的最先进水平，在7个数据集上击败了有监督的最先进水平，但在其他数据集上却明显更差。本文展示了如何通过调整提示来缓解这种情况。SGPT-BE和SGPT-CE的性能随模型大小而变化。不过，需要考虑增加的延迟、存储和计算成本。

GPT transformers are the largest language models available, yet semantic search is dominated by BERT transformers. We present SGPT-BE and SGPT-CE for applying GPT models as Bi-Encoders or Cross-Encoders to symmetric or asymmetric search. SGPT-BE produces semantically meaningful sentence embeddings by contrastive fine-tuning of only bias tensors and a novel pooling method. A 5.8 billion parameter SGPT-BE outperforms the best available sentence embeddings by 6% setting a new state-of-the-art on BEIR. It outperforms the concurrently proposed OpenAI Embeddings of the 175B Davinci endpoint, which fine-tunes 250,000 times more parameters. SGPT-CE uses log probabilities from GPT models without any fine-tuning. A 6.1 billion parameter SGPT-CE sets an unsupervised state-of-the-art on BEIR. It beats the supervised state-of-the-art on 7 datasets, but significantly loses on other datasets. We show how this can be alleviated by adapting the prompt. SGPT-BE and SGPT-CE performance scales with model size. Yet, increased latency, storage and compute costs should be considered. Code, models and result files are freely available at https://github.com/Muennighoff/sgpt.

3、[LG] Jury Learning: Integrating Dissenting Voices into Machine Learning Models

M L. Gordon, M S. Lam, J S Park, K Patel, J T. Hancock, T Hashimoto, M S. Bernstein

[Stanford University & Apple Inc]

陪审团学习：将分歧意见纳入机器学习模型。机器学习(ML)算法应该学习模仿哪些人提供的标签？对从在线评论毒性到虚假消息检测再到医疗诊断的机器学习任务，社会不同群体可能对真实标签有着不可调和的分歧。面对面向用户的任务中普遍存在的分歧，必须做出选择。目前的有监督机器学习用多数票隐式解决了这些标签分歧，而少数群体的标签被无视了。本文提出陪审团学习，一种有监督机器学习方法，通过陪审团的隐喻显式解决这些分歧：定义哪些人或群体，以何种比例来决定分类器的预测。例如，针对在线毒性的陪审团学习模型可能以女性和黑人陪审员为中心，因为他们通常是被在线骚扰的目标。为实现陪审团学习，设计了一种深度学习架构，为数据集中每个标注者建模，从标注者模型中采样来填充陪审团，再运行推理来进行分类。该架构使陪审团能动态调整其组成，探索反事实，并将异议可视化。

Whose labels should a machine learning (ML) algorithm learn to emulate? For ML tasks ranging from online comment toxicity to misinformation detection to medical diagnosis, different groups in society may have irreconcilable disagreements about ground truth labels. Supervised ML today resolves these label disagreements implicitly using majority vote, which overrides minority groups' labels. We introduce jury learning, a supervised ML approach that resolves these disagreements explicitly through the metaphor of a jury: defining which people or groups, in what proportion, determine the classifier's prediction. For example, a jury learning model for online toxicity might centrally feature women and Black jurors, who are commonly targets of online harassment. To enable jury learning, we contribute a deep learning architecture that models every annotator in a dataset, samples from annotators' models to populate the jury, then runs inference to classify. Our architecture enables juries that dynamically adapt their composition, explore counterfactuals, and visualize dissent.

4、[LG] Learned Turbulence Modelling with Differentiable Fluid Solvers

B List, L Chen, N Thuerey

[Technical University of Munich]

基于可微流体解算器的湍流建模学习。本文训练了基于卷积神经网络的湍流模型。这些学到的湍流模型在仿真时改善了不可压缩的Navier-Stokes方程的低分辨率解决方案。所提出方法包括开发一个可微的数值解算器，支持通过多个解算器步骤传播优化梯度。通过展示那些在训练过程中具有较多展开步骤的模型的卓越鲁棒性和准确性，来展示这一特性的重要性。这种方法被应用于三种二维湍流情况，一种是同质衰减湍流，一种是时间演进混合层，一种是空间演进混合层。与无模型仿真相比，所提出方法实现了长期先验统计数据的明显改善，而不需要将这些统计数据直接包含在学习目标中。在推理时，所提出的方法与类似的精确纯数字方法相比，也获得了大量的性能改进。

In this paper, we train turbulence models based on convolutional neural networks. These learned turbulence models improve under-resolved low resolution solutions to the incompressible Navier-Stokes equations at simulation time. Our method involves the development of a differentiable numerical solver that supports the propagation of optimisation gradients through multiple solver steps. We showcase the significance of this property by demonstrating the superior stability and accuracy of those models that featured a higher number of unrolled steps during training. This approach is applied to three two-dimensional turbulence flow scenarios, a homogeneous decaying turbulence case, a temporally evolving mixing layer and a spatially evolving mixing layer. Our method achieves significant improvements of long-term a-posteriori statistics when compared to no-model simulations, without requiring these statistics to be directly included in the learning targets. At inference time, our proposed method also gains substantial performance improvements over similarly accurate, purely numerical methods.

5、[SD] Deep Performer: Score-to-Audio Music Performance Synthesis

H Dong, C Zhou, T Berg-Kirkpatrick, J McAuley

[Dolby Laboratories & University of California San Diego]

Deep Performer：从乐谱到音频的音乐演奏合成。音乐演奏合成的目的，是将乐谱合成为一种自然的演奏。本文借用文本到语音合成的最新进展，提出Deep Performer——一种用于乐谱到音频音乐演奏合成的新系统。与语音不同，音乐经常包含复音和长音。因此，本文提出两种新技术来处理复音输入，并在Transformer编码器-解码器模型中提供细粒度的调节。为训练所提出的系统，提出了一种新的小提琴数据集，包括成对的录音和乐谱，以及它们之间的估计对齐。所提出的模型可以合成具有清晰的复调和和声结构的音乐。在听觉测试中，在音高准确性、音色和噪音水平方面达到了与基线模型(一个条件生成音频模型)相竞争的质量。此外，所提出模型在整体质量上明显优于现有的钢琴数据集的基线。

Music performance synthesis aims to synthesize a musical score into a natural performance. In this paper, we borrow recent advances in text-to-speech synthesis and present the Deep Performer—a novel system for score-to-audio music performance synthesis. Unlike speech, music often contains polyphony and long notes. Hence, we propose two new techniques for handling polyphonic inputs and providing a finegrained conditioning in a transformer encoder-decoder model. To train our proposed system, we present a new violin dataset consisting of paired recordings and scores along with estimated alignments between them. We show that our proposed model can synthesize music with clear polyphony and harmonic structures. In a listening test, we achieve competitive quality against the baseline model, a conditional generative audio model, in terms of pitch accuracy, timbre and noise level. Moreover, our proposed model significantly outperforms the baseline on an existing piano dataset in overall quality.