来自今天的爱可可AI前沿推介

[CV] Learning to Summarize Videos by Contrasting Clips

I Sosnovik, A Moskalev, C Kaandorp, A Smeulders
[University of Amsterdam]

基于视频片段对比的视频摘要学习

要点:

  1. 提出一种新的无监督视频摘要方法;
  2. 分析了对视频摘要的关键要求:代表性、稀疏性和多样性;
  3. 提出一种基于预测帧级打分的可微 top-k 选择器,用于提升对比性视频摘要。

一句话总结:
提出一种新的无监督视频摘要方法。通过对视频摘要的要求进行分析,提出对比学习作为解决方案,通过预测帧级打分并采用可微 top-k 特征选择器,实现了一种新的视频摘要方法。

摘要:
视频摘要旨在选择视频中尽可能接近原始故事的部分内容。大多数现有的视频摘要方法都专注于手工标记的标签。随着视频数量呈指数增长,越来越需要能够在没有标签注释的情况下学习有意义的摘要的方法。本文旨在最大限度地利用无监督的视频摘要,同时将监督集中在几个个性化标签上作为附件。本文制定了信息化视频摘要的关键要求。建议将对比学习作为这两个问题的答案。为了进一步提高对比视频摘要(CSUM),建议对比 top-k 特征,而不是现有方法使用的平均视频特征,用可微 top-k 特征选择器实现。在几个基准上的实验表明,在没有提供标签数据的情况下,所提出方法可实现有意义和多样化的摘要。

Video summarization aims at choosing parts of a video that narrate a story as close as possible to the original one. Most of the existing video summarization approaches focus on hand-crafted labels. se As the number of videos grows exponentially, there emerges an increasing need for methods that can learn meaningful summarizations without labeled annotations. In this paper, we aim to maximally exploit unsupervised video summarization while concentrating the supervision to a few, personalized labels as an add-on. To do so, we formulate the key requirements for the informative video summarization. Then, we propose contrastive learning as the answer to both questions. To further boost Contrastive video Summarization (CSUM), we propose to contrast top-k features instead of a mean video feature as employed by the existing method, which we implement with a differentiable top-k feature selector. Our experiments on several benchmarks demonstrate, that our approach allows for meaningful and diverse summaries when no labeled data is provided.

论文链接:https://arxiv.org/abs/2301.05213
图片
图片
图片
图片

内容中包含的图片若涉及版权问题,请及时与我们联系删除