[AS] Melody transcription via generative pre-training
C Donahue, J Thickstun, P Liang
[Stanford University]
基于生成式预训练的旋律转录
简介:提出一种新方法和新的数据集,利用生成式预训练来改善音乐音频的旋律转录,其结果比传统频谱特征好20%,比最强基线的性能好77%。
摘要:尽管旋律在音乐感知中起着核心作用,但在音乐信息检索中,可靠地检测任意音乐录音中的旋律音符仍然是一个公开的挑战。旋律转录的一个关键挑战是建立能够处理包含任何数量的乐器组合和音乐风格的广泛音频的方法——现有策略对某些旋律乐器或风格很有效,但不是所有的。为了应对这一挑战,本文利用Jukebox的表示,一种通用的音乐音频生成模型,从而使旋律转录的性能相对于传统的频谱特征提高了20%。旋律转录的另一个障碍是缺乏训练数据——本文通过广泛音乐的众包标注得出了一个包含50小时旋律转录的新数据集。生成式预训练和这个新数据集的结合,使旋律转录的性能相对于最强可用基线强了77%。通过将新的旋律转录方法与节拍检测、调性估计和和弦识别的解决方案相结合,建立了Sheet Sage,一个能够直接从音乐音频中转录人类可读乐谱的系统。
论文链接:https://arxiv.org/abs/2212.01884
Despite the central role that melody plays in music perception, it remains an open challenge in music information retrieval to reliably detect the notes of the melody present in an arbitrary music recording. A key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles - existing strategies work well for some melody instruments or styles but not all. To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio, thereby improving performance on melody transcription by 20% relative to conventional spectrogram features. Another obstacle in melody transcription is a lack of training data - we derive a new dataset containing 50 hours of melody transcriptions from crowdsourced annotations of broad music. The combination of generative pre-training and a new dataset for this task results in 77% stronger performance on melody transcription relative to the strongest available baseline. By pairing our new melody transcription approach with solutions for beat detection, key estimation, and chord recognition, we build Sheet Sage, a system capable of transcribing human-readable lead sheets directly from music audio.
评论
沙发等你来抢