Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores

2024年05月02日
  • 简介
    这份报告介绍了DSBA LAB团队提出的ECO(Ensembled Clip score and cOnsensus score)管线,这是一种用于评估和排名给定图像的字幕的新框架。ECO选择最准确描述图像的字幕。它通过将考虑图像和字幕之间的语义对齐的Ensembled CLIP分数与考虑字幕的重要性的一致性分数相结合实现。使用这个框架,我们在CVPR 2024研讨会上取得了显著的成功,这是在新的零样本图像字幕评估的前沿(NICE)上进行的字幕重新排名评估挑战。具体而言,我们在CIDEr度量上获得了第三名,在SPICE和METEOR度量上均获得第二名,在ROUGE-L和所有BLEU分数度量上均获得第一名。ECO框架的代码和配置可在https://github.com/DSBA-Lab/ECO上获得。
  • 图表
  • 解决问题
    ECO framework is proposed to evaluate and rank captions for a given image based on semantic alignment and essentialness of captions. The problem is to improve the accuracy of image captioning.
  • 关键思路
    ECO combines Ensembled CLIP score and Consensus score to select the most accurate caption for an image. The Ensembled CLIP score considers the semantic alignment between the image and captions, while the Consensus score accounts for the essentialness of the captions.
  • 其它亮点
    The ECO framework achieved notable success in the CVPR 2024 Workshop Challenge on Caption Re-ranking Evaluation. The framework secured third place based on the CIDEr metric, second in both the SPICE and METEOR metrics, and first in the ROUGE-L and all BLEU Score metrics. The code and configuration for the ECO framework are available at https://github.com/ DSBA-Lab/ECO.
  • 相关研究
    Some related studies in this field include 'Show, Attend and Tell: Neural Image Caption Generation with Visual Attention', 'Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering', and 'Neural Image Captioning with Visual Attention'.
PDF
原文
点赞 收藏 评论 分享到Link

沙发等你来抢

去评论