由玛丽皇后大学数字音乐中心的博士生Ilaria Manco收集。包括综述论文、论文、数据集等。

Survey Papers

Journal and Conference Papers

Summary of papers on multimodal machine learning for music, including the review papers highlighted above.

Audio-Text

Year Paper Title Code
2022 Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model  
2022 Conversational Music Retrieval with Synthetic Data  
2022 Contrastive audio-language learning for music GitHub
2022 Learning music audio representations via weak language supervision GitHub
2022 Mulan: A joint embedding of music audio and natural language  
2022 RECAP: Retrieval Augmented Music Captioner  
2022 Clap: Learning audio concepts from natural language supervision GitHub
2022 Toward Universal Text-to-Music Retrieval GitHub
2021 MusCaps: Generating Captions for Music Audio GitHub
2020 MusicBERT - learning multi-modal representations for music and text  
2020 Music autotagging as captioning  
2019 Deep cross-modal correlation learning for audio and lyrics in music retrieval  
2018 Music mood detection based on audio and lyrics with deep neural net  
2016 Exploring customer reviews for music genre classification and evolutionary studies  
2016 Towards Music Captioning: Generating Music Playlist Descriptions  
2008 Multimodal Music Mood Classification using Audio and Lyrics  

Other

Year Paper Title Code
2021 Multimodal metric learning for tag-based music retrieval GitHub
2021 Enriched music representations with multiple cross-modal contrastive learning GitHub
2020 Large-Scale Weakly-Supervised Content Embeddings for Music Recommendation and Tagging  
2020 Music gesture for visual sound separation  
2020 Foley music: Learning to generate music from videos  
2020 Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags GitHub
2019 Audio-visual embedding for cross-modal music video retrieval through supervised deep CCA  
2019 Query-by-Blending: a Music Exploration System Blending Latent Vector Representations of Lyric Word, Song Audio, and Artist  
2019 Learning Affective Correspondence between Music and Image  
2019 Multimodal music information processing and retrieval: Survey and future challenges  
2019 Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies  
2019 Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications  
2019 Query by Video: Cross-Modal Music Retrieval  
2018 The Sound of Pixels GitHub
2018 Image generation associated with music data  
2018 Multimodal Deep Learning for Music Genre Classification GitHub
2018 JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features GitHub
2018 Cbvmr: content-based video-music retrieval using soft intra-modal structure constraint GitHub
2017 A deep multimodal approach for cold-start music recommendation GitHub
2017 Learning neural audio embeddings for grounding semantics in auditory perception  
2017 Music emotion recognition via end-To-end multimodal neural networks  
2013 Cross-modal Sound Mapping Using Deep Learning  
2013 Music emotion recognition: From content- to context-based models  
2011 Musiclef: A benchmark activity in multimodal music information retrieval  
2011 The need for music information retrieval with user-centered and multimodal strategies  
2009 Combining audio content and social context for semantic music discovery  

Datasets

Dataset Description Modalities Size
MARD Multimodal album reviews dataset Text, Metadata, Audio descriptors 65,566 albums and 263,525 reviews
URMP Multi-instrument musical pieces of recorded performances MIDI, Audio, Video 44 pieces (12.5GB)
IMAC Affective correspondences between images and music Images, Audio 85,000 images and 3,812 songs
EmoMV Affective Music-Video Correspondence Audio, Video 5986 pairs

Workshops, Tutorials & Talks

内容中包含的图片若涉及版权问题,请及时与我们联系删除