爱可可AI前沿推介(9.6)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

摘要：基于递归卷积神经网络的简洁学习算法学习、高斯过程样本乐观优化、图生成模型的评价指标、基于预训练视觉-语言模型的解缠文本驱动图像操纵、多重不确定因素下的鲁棒策略学习、大模型的协同推理与微调、深度神经网络的归一化效应、结构保持图表示学习、一阶逻辑自然语言推理

1、[LG] Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms

S Goel, S Kakade, A T Kalai, C Zhang

[Microsoft Research & Harvard University]

基于递归卷积神经网络的简洁学习算法学习。神经网络(NN)很难有效地学习某些问题，如奇偶性问题，即使有针对这些问题的简单学习算法。NN能自己发现学习算法吗？本文展示了一种NN架构，在多项式时间内，其学习效果与任何可由恒定大小的学习算法描述的有效学习算法一样好。例如，在奇偶性问题上，NN的学习效果与行减一样好，这是一种可以被简洁描述的有效算法。该架构结合了层间的递归分权和卷积分权，将参数数量减少到一个常数，即使网络本身可能有数万亿个节点。虽然在实践中，分析的常数太大，没有直接意义，但本文工作表明，递归和卷积网络(RCNN)的协同作用可能比任何一种单独的网络更强大。

Neural Networks (NNs) struggle to efficiently learn certain problems, such as parity problems, even when there are simple learning algorithms for those problems. Can NNs discover learning algorithms on their own? We exhibit a NN architecture that, in polynomial time, learns as well as any efficient learning algorithm describable by a constant-sized learning algorithm. For example, on parity problems, the NN learns as well as row reduction, an efficient algorithm that can be succinctly described. Our architecture combines both recurrent weight-sharing between layers and convolutional weight-sharing to reduce the number of parameters down to a constant, even though the network itself may have trillions of nodes. While in practice the constants in our analysis are too large to be directly meaningful, our work suggests that the synergy of Recurrent and Convolutional NNs (RCNNs) may be more powerful than either alone.

https://arxiv.org/abs/2209.00735

2、[LG] Optimistic Optimization of Gaussian Process Samples

J Grosse, C Zhang, P Hennig

[University of Tübingen & Microsoft Research Cambridge]

高斯过程样本乐观优化。贝叶斯优化是一种流行的全局优化形式，但其计算成本将其限制在昂贵的评价函数上。一个竞争性的、计算效率更高的全局优化框架是乐观优化，它利用关于搜索空间的几何形状的先验知识，其形式是异同函数。本文研究贝叶斯优化的概念优势在何种程度上可以与乐观优化的计算效率相结合。通过将核映射到异同函数，本文得到一种运行时间达到O(N logN)的贝叶斯优化设置的乐观优化算法。作为高层次的收获，本文发现，当在评估成本相对较低的目标上使用静止核时，乐观优化可以强烈地优于贝叶斯优化，而对于强耦合和参数化模型，贝叶斯优化的良好实现可以表现得更好，即使是在低评估成本下。在几何搜索和概率搜索之间存在一个新的研究领域，即运行速度大大超过传统贝叶斯优化的方法，同时保留贝叶斯优化的一些关键功能。

Bayesian optimization is a popular formalism for global optimization, but its computational costs limit it to expensive-to-evaluate functions. A competing, computationally more efficient, global optimization framework is optimistic optimization, which exploits prior knowledge about the geometry of the search space in form of a dissimilarity function. We investigate to which degree the conceptual advantages of Bayesian Optimization can be combined with the computational efficiency of optimistic optimization. By mapping the kernel to a dissimilarity, we obtain an optimistic optimization algorithm for the Bayesian Optimization setting with a run-time of up to O(N logN). As a high-level take-away we find that, when using stationary kernels on objectives of relatively low evaluation cost, optimistic optimization can be strongly preferable over Bayesian optimization, while for strongly coupled and parametric models, good implementations of Bayesian optimization can perform much better, even at low evaluation cost. We argue that there is a new research domain between geometric and probabilistic search, i.e. methods that run drastically faster than traditional Bayesian optimization, while retaining some of the crucial functionality of Bayesian optimization.

https://arxiv.org/abs/2209.00895

3、[LG] Evaluation Metrics for Graph Generative Models: Problems, Pitfalls, and Practical Solutions

L O'Bray, M Horn, B Rieck, K Borgwardt

[ETH Zürich]

图生成模型的评价指标：问题、陷阱和实际解决方案。图生成模型是机器学习的一个非常活跃的分支。鉴于复杂度不断增加的新模型的稳定发展，有必要提供一种原则性的方法来评估和比较它们。本文列举了这种比较指标的理想标准，并概述了目前使用的图生成模型比较的现状，它主要是依靠最大平均差异(MMD)。本文在图生成模型比较的背景下对MMD进行了系统的评估，强调了研究人员在无意中可能遇到的一些挑战和陷阱。在对MMD在综合生成的扰动图以及最近提出的图生成模型上的行为进行彻底分析后，提供一个合适的程序来减轻这些挑战和陷阱。本文将发现汇总成一个实用的建议清单，供研究人员在评估图生成模型时使用。

Graph generative models are a highly active branch of machine learning. Given the steady development of new models of ever-increasing complexity, it is necessary to provide a principled way to evaluate and compare them. In this paper, we enumerate the desirable criteria for such a comparison metric and provide an overview of the status quo of graph generative model comparison in use today, which predominantly relies on the maximum mean discrepancy (MMD). We perform a systematic evaluation of MMD in the context of graph generative model comparison, highlighting some of the challenges and pitfalls researchers inadvertently may encounter. After conducting a thorough analysis of the behaviour of MMD on synthetically-generated perturbed graphs as well as on recently-proposed graph generative models, we are able to provide a suitable procedure to mitigate these challenges and pitfalls. We aggregate our findings into a list of practical recommendations for researchers to use when evaluating graph generative models.

https://arxiv.org/abs/2106.01098

4、[CV] Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

Z Xu, T Lin, H Tang, F Li...

[University of Trento & Baidu Inc & ETH Zurich]

预测、预防与评估：基于预训练视觉-语言模型的解缠文本驱动图像操纵。为实现解缠图像操纵，之前的工作在很大程度上依赖于人工标注。同时，可用的操纵被限制在一个预定义好的模型训练集上。本文提出一种新框架，即预测、预防和评估(PPE)，用于分解文本驱动的图像操纵，只需要很少的人工标注，同时适用于各种操作。所提出的方法通过深入利用大规模预训练的视觉-语言模型CLIP接近目标。具体来说，首先预测一个给定文本命令的可能纠缠的属性。然后，基于预测的属性，引入一个纠缠损失来防止训练期间的纠缠。最后，提出一种新的评估指标来评估解缠的图像操纵。本文在具有挑战性的人脸编辑任务上验证了所提出方法的有效性。广泛的实验表明，该PPE框架取得了比最新的StyleCLIP基线更好的定量和定性结果。

To achieve disentangled image manipulation, previous works depend heavily on manual annotation. Meanwhile, the available manipulations are limited to a pre-defined set the models were trained for. We propose a novel framework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of manipulations. Our method approaches the targets by deeply exploiting the power of the large-scale pre-trained visionlanguage model CLIP [32]. Concretely, we firstly Predict the possibly entangled attributes for a given text command. Then, based on the predicted attributes, we introduce an entanglement loss to Prevent entanglements during training. Finally, we propose a new evaluation metric to Evaluate the disentangled image manipulation. We verify the effectiveness of our method on the challenging face editing task. Extensive experiments show that the proposed PPE framework achieves much better quantitative and qualitative results than the up-to-date StyleCLIP [31] baseline. Code is *The work was done during Zipeng Xu’s internship at VIS, Baidu. available at https://github.com/zipengxuc/PPE.

https://arxiv.org/abs/2111.13333

5、[LG] Robust Policy Learning over Multiple Uncertainty Sets

A Xie, S Sodhani, C Finn, J Pineau, A Zhang

[Stanford University & Facebook AI Research]

多重不确定因素下的鲁棒策略学习。强化学习(RL)智能体需要对安全关键环境中的变化具有鲁棒性。虽然系统识别方法提供了一种从在线经验中推断变化的方法，但在不可能进行快速识别的情况下，它们会失败。另一种主流方法是鲁棒强化学习，其产生的策略可以处理最坏的情况，但这些方法通常被设计为实现对单一不确定性集的稳健性，而这些不确定性集必须在训练时指定。为了获得更通用的解决方案，本文提出了多集鲁棒性问题，以学习对不同扰动集鲁棒的策略。然后，本文设计了一种算法，享有系统识别和鲁棒性强化学习的好处：在给定几个交互作用的情况下，它尽可能减少不确定性，但仍能对剩余的不确定性采取鲁棒性行动。在一系列不同的控制任务中，所提出方法与之前基于系统识别和仅基于鲁棒性强化学习的方法相比，在新环境中表现出了更好的最坏情况下的性能。

Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods provide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally designed to achieve robustness to a single uncertainty set that must be specified at train time. Towards a more general solution, we formulate the multi-set robustness problem to learn a policy robust to different perturbation sets. We then design an algorithm that enjoys the benefits of both system identification and robust RL: it reduces uncertainty where possible given a few interactions, but can still act robustly with respect to the remaining uncertainty. On a diverse set of control tasks, our approach demonstrates improved worstcase performance on new environments compared to prior methods based on system identification and on robust RL alone.

https://arxiv.org/abs/2202.07013