- 简介最近,对比语音预训练(CLAP)已经成为使音频分析更具普适性的方法。具体而言,CLAP风格的模型能够回答各种语言查询,扩展了音频模型的能力,超越了一个封闭的标签集。然而,CLAP需要大量的(音频,查询)对进行预训练。虽然这样的数据集可用于一般的音频任务,如字幕或声音事件检测,但是对于计算语用学(CP)任务,没有匹配的音频和文本查询的数据集。因此,社区依赖于为一般音频训练的通用CLAP模型,但取得的成果有限。在本研究中,我们探讨了ParaCLAP的训练考虑,这是一种适用于CP的CLAP风格模型,包括一种新的创建音频-语言查询的过程。我们展示了它在一组计算语用学任务上的有效性,表明它超过了开源最先进模型的性能。
-
- 图表
- 解决问题ParaCLAP: Contrastive Language-Audio Pretraining for Computational Paralinguistics
- 关键思路The paper proposes a novel process for creating audio-language queries to train a CLAP-style model, called ParaCLAP, for computational paralinguistics tasks.
- 其它亮点ParaCLAP outperforms open-source state-of-the-art models on a set of computational paralinguistic tasks. The paper discusses the challenges of creating a dataset with matched audio and text queries for CP tasks. The proposed process for creating audio-language queries involves using a text-to-speech model to generate synthetic speech for the text queries. The model is trained on a large-scale dataset of speech and text, and the pretraining process involves using contrastive learning to learn representations that capture both audio and text information.
- Related work includes CLAP-style models for general audio tasks, such as captioning and sound event detection, as well as previous work on using pretraining for CP tasks, such as emotion recognition and speaker identification.
NEW
提问交流
提交问题,平台邀请作者,轻松获得权威解答~
向作者提问

提问交流