Published December 6, 2022 | Version v1
Preprint Open

MAP-MUSIC2VEC: A SIMPLE AND EFFECTIVE BASELINE FOR SELF-SUPERVISED MUSIC AUDIO REPRESENTATION LEARNING

  • 1. University of Sheffield
  • 2. Beijing Academy of Artificial Intelligence, Carnegie Mellon University
  • 3. Beijing Academy of Artificial Intelligence, University of Michigan, Ann Arbor
  • 4. Centre for Digital Music, Queen Mary University of London
  • 5. 1Department of Computer Science, University of Sheffield
  • 6. University of Michigan Ann Arbor, USA
  • 7. Department of Computer Science, University of Sheffield
  • 8. School of Music, Carnegie Mellon University
  • 9. HSBC Business School, Peking University, China
  • 10. University of Tübingen & MPI-IS, Germany
  • 11. Centre for Digital Music, Queen Mary University of London, UK
  • 12. Department of Computer Science, University of Sheffield, UK
  • 13. Dartmouth College, NH, USA
  • 14. Beijing Academy of Artificial Intelligence, China

Description

The deep learning community has witnessed an exponentially growing interest in self-supervised learning (SSL). However, it still remains unexplored how to build a framework for learning useful representations of raw music waveforms in a self-supervised manner. In this work, we design Music2Vec, a framework exploring different SSL algorithmic components and tricks for music audio recordings. Our model achieves comparable results to the state-of-the-art (SOTA) music SSL model Jukebox, despite being significantly smaller with less than 2% of parameters of the latter. The model will be released on Huggingface.(https://huggingface.co/m-a-p/music2vec-v1)

 

The paper has been published at ISMIR LBD 2022. We only used 1k/130k hours of data to train the ISMIR LBD demo and will further scale up to get better performance.

Files

mhkcfwvzfptvydrkzjgdvsyhgzsdgnrb.zip

Files (400.6 kB)

Name Size Download all
md5:c5c8c6748ea8133aab38e5ddbf89decd
201.0 kB Preview Download
md5:40ac856587dc7ec535610b844974575d
199.6 kB Preview Download