爱可可AI前沿推介(1.7)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

1、[LG] Speech-to-SQL: Towards Speech-driven SQL Query Generation From Natural Language Question

Y Song, R C Wong, X Zhao, D Jiang

[The Hong Kong University of Science and Technology & WeBank Co.]

语音到SQL：自然语言问题的语音驱动SQL查询生成研究。随着智能手机和平板电脑在日常生活中的普及，基于语音的输入已经获得了很大的发展，因为语音是最简单有效的人机交互方式。本文致力于设计更有效的基于语音的界面，来查询关系型数据库中的结构化数据。首先确定了一个新的任务，"语音到SQL"，目的是理解人类语音所传达的信息，并将其直接翻译成结构化查询语言(SQL)语句。对这个问题的一个天真的解决方案，是以级联的方式工作，即一个自动语音识别(ASR)组件，然后是一个文本到SQL组件。然而，这需要一个高质量的ASR系统，而且还受到两个组件之间的错误复合问题的影响，导致性能有限。为应对这些挑战，本文进一步提出一种名为SpeechSQLNet的新型端到端神经架构，可直接将人类语音翻译成SQL查询，无需外部的ASR步骤。SpeechSQLNet的优势，在于充分利用了语音中所呈现的丰富的语言学信息。这是第一次尝试根据任意自然语言问题直接合成SQL，而不是基于自然语言的SQL版或其变体的有限SQL语法。为验证所提出问题和模型的有效性，本文进一步构建了一个名为SpeechQL的数据集，扩展了广泛使用的文本到SQL数据集。在该数据集上进行的广泛的实验评估表明，SpeechSQLNet可以直接从人类语音中合成高质量SQL查询，在精确匹配的准确性方面优于其他现有方法。

Speech-based inputs have been gaining significant momentumwith the popularity of smartphones and tablets in our daily lives, since voice is the most easiest and efficient way for human-computer interaction. This paper works towards designing more effective speech-based interfaces to query the structured data in relational databases. We first identify a new task named Speech-to-SQL, which aims to understand the information conveyed by human speech and directly translate it into structured query language (SQL) statements. A naive solution to this problem can work in a cascaded manner, that is, an automatic speech recognition (ASR) component followed by a text-to-SQL component. However, it requires a high-quality ASR system and also suffers from the error compounding problem between the two components, resulting in limited performance. To handle these challenges, we further propose a novel end-toend neural architecture named SpeechSQLNet to directly translate human speech into SQL queries without an external ASR step. SpeechSQLNet has the advantage of making full use of the rich linguistic information presented in speech. To the best of our knowledge, this is the first attempt to directly synthesize SQL based on arbitrary natural language questions, rather than a natural languagebased version of SQL or its variants with a limited SQL grammar. To validate the effectiveness of the proposed problem and model, we further construct a dataset named SpeechQL, by piggybacking the widely-used text-to-SQL datasets. Extensive experimental evaluations on this dataset show that SpeechSQLNet can directly synthesize high-quality SQL queries from human speech, outperforming various competitive counterparts as well as the cascaded methods in terms of exact match accuracies. We expect speech-toSQL would inspire more research on more effective and efficient human-machine interfaces to lower the barrier of using relational databases.

2、[LG] Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies

V Lai, C Chen, Q. V Liao, A Smith-Renner, C Tan

[University of Colorado Boulder & University of Chicago & Microsoft Research & Dataminr]

人-AI决策科学的经验性研究综述。随着人工智能系统表现出越来越强的预测性能，它们在许多领域的应用也越来越多。然而，在刑事司法和医疗保健等高风险领域，由于安全、伦理和法律方面的考虑，完全自动化往往不可行，但全人工的方法可能是不准确和耗时的。因此，研究界对用人工智能协助增强人工决策的兴趣越来越大。除了为此目的开发人工智能技术外，新兴的人-AI决策领域必须接受实证方法，以形成对人类如何与人工智能互动和合作进行决策的基础性理解。本文调研了最近关于这一主题的人类主体经验性研究文献。从三个重要方面总结了100多篇论文中的研究设计选择。(1) 决策任务，(2) 人工智能模型和人工智能辅助元素，以及(3) 评估指标。对于每个方面，总结了当前的趋势，讨论了该领域当前实践中的差距，并列出了未来研究的建议。调研强调了开发通用框架的必要性，以说明人-AI决策的设计和研究空间，这样研究人员就可以在研究设计中做出严格的选择，研究界也可以在彼此的工作基础上产生可推广的科学知识。

As AI systems demonstrate increasingly strong predictive performance, their adoption has grown in numerous domains. However, in high-stakes domains such as criminal justice and healthcare, full automation is often not desirable due to safety, ethical, and legal concerns, yet fully manual approaches can be inaccurate and time consuming. As a result, there is growing interest in the research community to augment human decision making with AI assistance. Besides developing AI technologies for this purpose, the emerging field of human-AI decision making must embrace empirical approaches to form a foundational understanding of how humans interact and work with AI to make decisions. To invite and help structure research efforts towards a science of understanding and improving human-AI decision making, we survey recent literature of empirical human-subject studies on this topic. We summarize the study design choices made in over 100 papers in three important aspects: (1) decision tasks, (2) AI models and AI assistance elements, and (3) evaluation metrics. For each aspect, we summarize current trends, discuss gaps in current practices of the field, and make a list of recommendations for future research. Our survey highlights the need to develop common frameworks to account for the design and research spaces of human-AI decision making, so that researchers can make rigorous choices in study design, and the research community can build on each other’s work and produce generalizable scientific knowledge. We also hope this survey will serve as a bridge for HCI and AI communities to work together to mutually shape the empirical science and computational technologies for human-AI decision making.

3、[LG] Super-resolution in Molecular Dynamics Trajectory Reconstruction with Bi-Directional Neural Networks

L Winkler, K Müller, H E. Sauceda

[Technische Universitat Berlin]

基于双向神经网络的分子动力学超分辨率轨迹重建。分子动力学模拟是科学的基石，允许从系统的热力学研究到分析复杂的分子相互作用。一般来说，创建扩展的分子轨迹可能是一个计算成本很高的过程。因此，重复这种计算以获得更准确的热力学，或在细粒度的量子相互作用产生的动力学中获得更高的分辨率，可能会耗费大量时间和计算。本文探索了不同的机器学习(ML)方法，在后处理步骤中按需提高分子动力学轨迹的分辨率。作为概念验证，分析了双向神经网络的性能，如神经ODE、哈密尔顿网络、循环神经网络和LSTM，以及作为参考的单向变体，用于分子动力学模拟。发现Bi-LSTM是性能最好的模型；通过利用恒温轨迹的局部时间对称性，它们甚至可以学习长程的相关性，并在整个分子复杂性中对含噪的动力学表现出高度的鲁棒性。该模型在轨迹插值方面可以达到10⁻⁴埃的精度，同时忠实地重建几个完整的未见过的复杂的高频分子振动周期，使学到的轨迹和参考轨迹之间几乎无法区分。本文报告的结果可以用于（1）更大系统的基线，以及（2）构建更好的MD集成器。

Molecular dynamics simulations are a cornerstone in science, allowing to investigate from the system's thermodynamics to analyse intricate molecular interactions. In general, to create extended molecular trajectories can be a computationally expensive process, for example, when running ab−initio simulations. Hence, repeating such calculations to either obtain more accurate thermodynamics or to get a higher resolution in the dynamics generated by a fine-grained quantum interaction can be time- and computationally-consuming. In this work, we explore different machine learning (ML) methodologies to increase the resolution of molecular dynamics trajectories on-demand within a post-processing step. As a proof of concept, we analyse the performance of bi-directional neural networks such as neural ODEs, Hamiltonian networks, recurrent neural networks and LSTMs, as well as the uni-directional variants as a reference, for molecular dynamics simulations (here: the MD17 dataset). We have found that Bi-LSTMs are the best performing models; by utilizing the local time-symmetry of thermostated trajectories they can even learn long-range correlations and display high robustness to noisy dynamics across molecular complexity. Our models can reach accuracies of up to 10⁻⁴ angstroms in trajectory interpolation, while faithfully reconstructing several full cycles of unseen intricate high-frequency molecular vibrations, rendering the comparison between the learned and reference trajectories indistinguishable. The results reported in this work can serve (1) as a baseline for larger systems, as well as (2) for the construction of better MD integrators.

4、[LG] Lyapunov Exponents for Diversity in Differentiable Games

J Lorraine, P Vicol, J Parker-Holder, T Kachman, L Metz, J Foerster

[University of Toronto & University of Oxford & Radboud University & Google Research]

面向可微博弈多样性的李亚普诺夫指数。Ridge Rider(RR)是一种算法，通过跟踪Hessian的特征向量("ridges")来寻找优化问题的多样化解决方案。RR是为保守梯度系统(即涉及单一损失函数的设置)而设计的，在鞍座(容易找到的分叉点)上有分支。本文通过提出一种寻找任意分岔点的方法——称为广义Ridge Rider(GRR)——将这一思想推广到非保守的多代理梯度系统。通过利用动态系统领域的机制该方法提供理论上的动力。构建了新的玩具问题，可以直观地看到新的现象，同时对感兴趣的高维问题进行深入了解。通过在迭代囚徒困境和包括生成对抗网络在内的相关机器学习问题中找到不同的解决方案来实证评估所提出的方法。

Ridge Rider (RR) is an algorithm for finding diverse solutions to optimization problems by following eigenvectors of the Hessian (“ridges”). RR is designed for conservative gradient systems (i.e., settings involving a single loss function), where it branches at saddles — easy-to-find bifurcation points. We generalize this idea to nonconservative, multi-agent gradient systems by proposing a method – denoted Generalized Ridge Rider (GRR) – for finding arbitrary bifurcation points. We give theoretical motivation for our method by leveraging machinery from the field of dynamical systems. We construct novel toy problems where we can visualize new phenomena while giving insight into high-dimensional problems of interest. Finally, we empirically evaluate our method by finding diverse solutions in the iterated prisoners’ dilemma and relevant machine learning problems including generative adversarial networks.

5、[LG] Neural Bellman-Ford Networks: A General Graph Neural Network Framework for Link Prediction

Z Zhu, Z Zhang, L Xhonneux, J Tang

[Mila]

神经Bellman-Ford网络：链接预测通用图神经网络框架。链接预测是图上一个非常基本的任务。受传统基于路径方法的启发，本文提出了一种基于路径的通用而灵活的表示学习框架，用于链接预测，把一对节点的表示定义为节点之间所有路径表示的广义和，每个路径表示都是路径中边缘表示的广义积。受解决最短路径问题的Bellman-Ford算法的启发，所提出的路径表示可以被广义的Bellman-Ford算法有效解决。为了进一步提高路径表示的能力，本文提出神经Bellman-Ford网络(NBFNet)，一种通用的图神经网络框架，可以用广义Bellman-Ford算法中的学习算子来解决路径表示。NBFNet用3个神经组件对广义Bellman-Ford算法进行参数化，即INDICATOR、MESSAGE和AGGREGATE函数，分别对应于边界条件、乘法算子和求和算子。NBFNet涵盖了许多传统的基于路径的方法，并且可以应用于同质图和多关系图(例如知识图谱)的反演和归纳场景。在同质图和知识图谱上的实验表明，所提出的NBFNet在过渡性和归纳性场景都以很大的优势胜过现有的方法，取得了新的最先进的结果。

Link prediction is a very fundamental task on graphs. Inspired by traditional path-based methods, in this paper we propose a general and flexible representation learning framework based on paths for link prediction. Specifically, we define the representation of a pair of nodes as the generalized sum of all path representations between the nodes, with each path representation as the generalized product of the edge representations in the path. Motivated by the Bellman-Ford algorithm for solving the shortest path problem, we show that the proposed path formulation can be efficiently solved by the generalized Bellman-Ford algorithm. To further improve the capacity of the path formulation, we propose the Neural Bellman-Ford Network (NBFNet), a general graph neural network framework that solves the path formulation with learned operators in the generalized Bellman-Ford algorithm. The NBFNet parameterizes the generalized Bellman-Ford algorithm with 3 neural components, namely INDICATOR, MESSAGE and AGGREGATE functions, which corresponds to the boundary condition, multiplication operator, and summation operator respectively1. The NBFNet covers many traditional path-based methods, and can be applied to both homogeneous graphs and multi-relational graphs (e.g., knowledge graphs) in both transductive and inductive settings. Experiments on both homogeneous graphs and knowledge graphs show that the proposed NBFNet outperforms existing methods by a large margin in both transductive and inductive settings, achieving new state-of-the-art results2.

另外几篇值得关注的论文：