来自今天的爱可可AI前沿推介

[LG] GOGGLE: Generative Modelling for Tabular Data by Learning Relational Structure

T Liu, Z Qian, J Berrevoets, M v d Schaar
University of Cambridge

GOGGLE: 基于关系结构学习的表格数据生成式建模

要点:

  1. GOGGLE是一种新的表格数据生成模型,学习并利用关系结构来更好地捕捉数据中的稀疏性和异质关系,同时引入先验知识和变量依赖关系的正则化;
  2. 所提出的消息传递方案,联合学习了关系结构和功能关系,使其成为第一个对表格数据做到这一点的工作;
  3. GOGGLE 在生成逼真合成数据和利用领域知识完成下游任务方面表现有效,超过了最先进的基准测试。

一句话总结:
提出一种新的表格数据生成模型GOGGLE,学习并利用关系结构来更好地模拟变量依赖性,并引入关系和先验知识的正则化。GOGGLE使用一个端到端的消息传递方案来联合学习关系结构和功能关系,作为生成合成样本的基础。

摘要:
深度生成模型学习高度复杂的非线性表征,以生成逼真的合成数据。虽然他们在计算机视觉和自然语言处理方面取得了明显的成功,但类似的进展在表格领域却不那么明显。部分原因是表格数据的生成性建模带来了一系列特殊的挑战,包括异质关系、有限的样本数量,以及纳入先验知识的困难。此外,与图像和序列领域的对应模型不同,表格数据的深度生成模型几乎只采用全连接层,它对输入之间的关系编码了弱的归纳偏差。现实世界的数据生成过程通常可以用关系结构来表示,这种结构编码变量间稀疏的异质关系。本文学习并利用表格数据背后的关系结构,来更好地模拟变量的依赖性,并作为一种自然的手段来引入关系的正则化,并包括先验知识。本文提出 GOGGLE,一个端到端的信息传递方案,联合学习关系结构和相应的功能关系,作为生成合成样本的基础。利用真实世界的数据集,提供了实证证据,证明所提出的方法在生成逼真的合成数据和利用领域知识进行下游任务方面是有效的。

Deep generative models learn highly complex and non-linear representations to generate realistic synthetic data. While they have achieved notable success in computer vision and natural language processing, similar advances have been less demonstrable in the tabular domain. This is partially because generative modelling of tabular data entails a particular set of challenges, including heterogeneous relationships, limited number of samples, and difficulties in incorporating prior knowledge. Additionally, unlike their counterparts in image and sequence domain, deep generative models for tabular data almost exclusively employ fully-connected layers, which encode weak inductive biases about relationships between inputs. Real-world data generating processes can often be represented using relational structures, which encode sparse, heterogeneous relationships between variables. In this work, we learn and exploit relational structure underlying tabular data to better model variable dependence, and as a natural means to introduce regularization on relationships and include prior knowledge. Specifically, we introduce GOGGLE, an end-to-end message passing scheme that jointly learns the relational structure and corresponding functional relationships as the basis of generating synthetic samples. Using real-world datasets, we provide empirical evidence that the proposed method is effective in generating realistic synthetic data and exploiting domain knowledge for downstream tasks.

论文链接:https://openreview.net/forum?id=fPVRcJqspu
图片
图片
图片
图片

内容中包含的图片若涉及版权问题,请及时与我们联系删除