来自今天的爱可可AI前沿推介
[CV] Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models
M J. Muckley, A El-Nouby, K Ullrich, H Jégou, J Verbeek
[Meta AI]
用隐局部似然模型提高神经图像压缩统计保真度
要点:
-
提出基于 VQ-VAE 自编码器的新对抗判别器,优化了局部图像近邻似然函数,称为"隐式局部似然模型”(ILLM); -
将 ILLM 与 MeanScale Hyperprior 神经压缩架构结合起来,创造出一种新的压缩器,称为 Mean-Scale-ILLM(MS-ILLM); -
在 CLIC2020、DIV2K 和 Kodak 数据集上,经验证明 MS-ILLM 可以超越 HiFiC 的统计保真度分数(由FID衡量)而不牺牲 PSNR。
一句话总结:
作者提出一种新的神经图像压缩模型,MS-ILLM,通过使用基于 VQ-VAE 自编码器的局部对抗性判别器来提高统计保真度,与最先进的 HiFiC 模型相比,产生了更好的 FID 和 KID 指标。
摘要:
有损图像压缩的目的,是在保持对原始图像的保真度的同时,用尽可能少的比特来表示图像。理论结果表明,优化 PSNR 或 MS-SSIM 等失真指标必然导致原始图像的统计数字与重建图像的统计数字不一致,特别是在低比特率下,通常表现为压缩图像的模糊。之前的工作利用对抗性判别器来提高统计的保真度。然而,这些从生成式建模任务中采用的二分判别器可能不是图像压缩的理想选择。本文提出一种非二分判别器,以通过 VQ-VAE 自编码器获得的量化的局部图像表示为条件。对CLIC2020、DIV2K 和 Kodak 数据集的评估表明,所提出判别器在联合优化失真度(如PSNR)和统计保真度(如FID)方面比最先进的 HiFiC 模型更有效。在 CLIC2020 测试集上,获得了与 HiFiC 相同的 FID,而比特数减少了 30-40%。
Lossy image compression aims to represent images in as few bits as possible while maintaining fidelity to the original. Theoretical results indicate that optimizing distortion metrics such as PSNR or MS-SSIM necessarily leads to a discrepancy in the statistics of original images from those of reconstructions, in particular at low bitrates, often manifested by the blurring of the compressed images. Previous work has leveraged adversarial discriminators to improve statistical fidelity. Yet these binary discriminators adopted from generative modeling tasks may not be ideal for image compression. In this paper, we introduce a non-binary discriminator that is conditioned on quantized local image representations obtained via VQ-VAE autoencoders. Our evaluations on the CLIC2020, DIV2K and Kodak datasets show that our discriminator is more effective for jointly optimizing distortion (e.g., PSNR) and statistical fidelity (e.g., FID) than the state-of-the-art HiFiC model. On the CLIC2020 test set, we obtain the same FID as HiFiC with 30-40% fewer bits.
论文链接:https://arxiv.org/abs/2301.11189
内容中包含的图片若涉及版权问题,请及时与我们联系删除
评论
沙发等你来抢