LG - 机器学习   CV - 计算机视觉   CL - 计算与语言   AS - 音频与语音 RO - 机器人




1、[LG] Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms

S Goel, S Kakade, A T Kalai, C Zhang

[Microsoft Research & Harvard University]


Neural Networks (NNs) struggle to efficiently learn certain problems, such as parity problems, even when there are simple learning algorithms for those problems. Can NNs discover learning algorithms on their own? We exhibit a NN architecture that, in polynomial time, learns as well as any efficient learning algorithm describable by a constant-sized learning algorithm. For example, on parity problems, the NN learns as well as row reduction, an efficient algorithm that can be succinctly described. Our architecture combines both recurrent weight-sharing between layers and convolutional weight-sharing to reduce the number of parameters down to a constant, even though the network itself may have trillions of nodes. While in practice the constants in our analysis are too large to be directly meaningful, our work suggests that the synergy of Recurrent and Convolutional NNs (RCNNs) may be more powerful than either alone.



2、[LG] Optimistic Optimization of Gaussian Process Samples

J Grosse, C Zhang, P Hennig

[University of Tübingen & Microsoft Research Cambridge]

高斯过程样本乐观优化。贝叶斯优化是一种流行的全局优化形式,但其计算成本将其限制在昂贵的评价函数上。一个竞争性的、计算效率更高的全局优化框架是乐观优化,它利用关于搜索空间的几何形状的先验知识,其形式是异同函数。本文研究贝叶斯优化的概念优势在何种程度上可以与乐观优化的计算效率相结合。通过将核映射到异同函数,本文得到一种运行时间达到O(N logN)的贝叶斯优化设置的乐观优化算法。作为高层次的收获,本文发现,当在评估成本相对较低的目标上使用静止核时,乐观优化可以强烈地优于贝叶斯优化,而对于强耦合和参数化模型,贝叶斯优化的良好实现可以表现得更好,即使是在低评估成本下。在几何搜索和概率搜索之间存在一个新的研究领域,即运行速度大大超过传统贝叶斯优化的方法,同时保留贝叶斯优化的一些关键功能。

Bayesian optimization is a popular formalism for global optimization, but its computational costs limit it to expensive-to-evaluate functions. A competing, computationally more efficient, global optimization framework is optimistic optimization, which exploits prior knowledge about the geometry of the search space in form of a dissimilarity function. We investigate to which degree the conceptual advantages of Bayesian Optimization can be combined with the computational efficiency of optimistic optimization. By mapping the kernel to a dissimilarity, we obtain an optimistic optimization algorithm for the Bayesian Optimization setting with a run-time of up to O(N logN). As a high-level take-away we find that, when using stationary kernels on objectives of relatively low evaluation cost, optimistic optimization can be strongly preferable over Bayesian optimization, while for strongly coupled and parametric models, good implementations of Bayesian optimization can perform much better, even at low evaluation cost. We argue that there is a new research domain between geometric and probabilistic search, i.e. methods that run drastically faster than traditional Bayesian optimization, while retaining some of the crucial functionality of Bayesian optimization.



3、[LG] Evaluation Metrics for Graph Generative Models: Problems, Pitfalls, and Practical Solutions

L O'Bray, M Horn, B Rieck, K Borgwardt

[ETH Zürich]


Graph generative models are a highly active branch of machine learning. Given the steady development of new models of ever-increasing complexity, it is necessary to provide a principled way to evaluate and compare them. In this paper, we enumerate the desirable criteria for such a comparison metric and provide an overview of the status quo of graph generative model comparison in use today, which predominantly relies on the maximum mean discrepancy (MMD). We perform a systematic evaluation of MMD in the context of graph generative model comparison, highlighting some of the challenges and pitfalls researchers inadvertently may encounter. After conducting a thorough analysis of the behaviour of MMD on synthetically-generated perturbed graphs as well as on recently-proposed graph generative models, we are able to provide a suitable procedure to mitigate these challenges and pitfalls. We aggregate our findings into a list of practical recommendations for researchers to use when evaluating graph generative models.



4、[CV] Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

Z Xu, T Lin, H Tang, F Li...

[University of Trento & Baidu Inc & ETH Zurich]


To achieve disentangled image manipulation, previous works depend heavily on manual annotation. Meanwhile, the available manipulations are limited to a pre-defined set the models were trained for. We propose a novel framework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of manipulations. Our method approaches the targets by deeply exploiting the power of the large-scale pre-trained visionlanguage model CLIP [32]. Concretely, we firstly Predict the possibly entangled attributes for a given text command. Then, based on the predicted attributes, we introduce an entanglement loss to Prevent entanglements during training. Finally, we propose a new evaluation metric to Evaluate the disentangled image manipulation. We verify the effectiveness of our method on the challenging face editing task. Extensive experiments show that the proposed PPE framework achieves much better quantitative and qualitative results than the up-to-date StyleCLIP [31] baseline. Code is *The work was done during Zipeng Xu’s internship at VIS, Baidu. available at https://github.com/zipengxuc/PPE.



5、[LG] Robust Policy Learning over Multiple Uncertainty Sets

A Xie, S Sodhani, C Finn, J Pineau, A Zhang

[Stanford University & Facebook AI Research]


Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods provide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally designed to achieve robustness to a single uncertainty set that must be specified at train time. Towards a more general solution, we formulate the multi-set robustness problem to learn a policy robust to different perturbation sets. We then design an algorithm that enjoys the benefits of both system identification and robust RL: it reduces uncertainty where possible given a few interactions, but can still act robustly with respect to the remaining uncertainty. On a diverse set of control tasks, our approach demonstrates improved worstcase performance on new environments compared to prior methods based on system identification and on robust RL alone.





[LG] Petals: Collaborative Inference and Fine-tuning of Large Models


A Borzunov, D Baranchuk, T Dettmers, M Ryabinin, Y Belkada, A Chumachenko, P Samygin, C Raffel

[Yandex & University of Washington & Hugging Face]



[LG] Normalization effects on deep neural networks

J Yu, K Spiliopoulos

[Boston University]



[LG] Structure-Preserving Graph Representation Learning


R Fang, L Wen, Z Kang, J Liu

[University of Electronic Science and Technology of China & Huawei Technologies Company Limited]



[CL] FOLIO: Natural Language Reasoning with First-Order Logic


S Han, H Schoelkopf, Y Zhao, Z Qi, M Riddell, L Benson, L Sun...

[Yale University & University of Illinois & Iowa City West High School & ...]



