BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM

简介

文本到图像（T2I）生成模型在生成复杂和高质量图像方面变得越来越重要，这也引起了人们对其输出中社会偏见的关注，特别是在人类生成方面。社会学研究已经建立了偏见的系统分类；然而，现有的T2I模型研究经常混淆不同类型的偏见，阻碍了这些方法的进展。在本文中，我们介绍了BIGbench，一个具有精心设计的数据集的偏见图像生成统一基准。与现有的基准不同，BIGbench将复杂的偏见分类和评估为四个维度：偏见的表现形式、偏见的可见度、获得的属性和受保护的属性。此外，BIGbench应用了先进的多模式大语言模型（MLLM），实现了完全自动化的评估同时保持高精度。我们将BIGbench应用于评估八个最近的通用T2I模型和三种去偏见方法。我们还进行了人类评估，其结果证明了BIGbench在对齐图像和识别各种偏见方面的有效性。此外，我们的研究还揭示了关于偏见的新研究方向，包括无关保护属性的副作用和蒸馏。我们的数据集和基准对于研究社区是公开可访问的，以确保可重复性。
图表
解决问题

BIGbench: A Unified Benchmark for Biases of Image Generation
关键思路

BIGbench is a unified benchmark for biases of image generation, which classifies and evaluates complex biases into four dimensions and applies advanced multi-modal large language models for fully automated evaluation while maintaining high accuracy.
其它亮点

The paper introduces a well-designed dataset and a unified benchmark for biases of image generation, which classifies and evaluates complex biases into four dimensions. BIGbench applies advanced multi-modal large language models for fully automated evaluation while maintaining high accuracy. The study evaluates eight recent general T2I models and three debiased methods, and conducts human evaluation to demonstrate the effectiveness of BIGbench in aligning images and identifying various biases. The dataset and benchmark are openly accessible to the research community to ensure reproducibility.
相关研究

Recent related studies include 'A Survey on Bias and Fairness in Machine Learning' and 'Fairness in Machine Learning: A Survey'.

BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM

评论