MAP-MUSIC2VEC: A SIMPLE AND EFFECTIVE BASELINE FOR SELF-SUPERVISED MUSIC AUDIO REPRESENTATION LEARNING
Authors/Creators
- 1. University of Sheffield
- 2. Beijing Academy of Artificial Intelligence, Carnegie Mellon University
- 3. Beijing Academy of Artificial Intelligence, University of Michigan, Ann Arbor
- 4. Centre for Digital Music, Queen Mary University of London
- 5. 1Department of Computer Science, University of Sheffield
- 6. University of Michigan Ann Arbor, USA
- 7. Department of Computer Science, University of Sheffield
- 8. School of Music, Carnegie Mellon University
- 9. HSBC Business School, Peking University, China
- 10. University of Tübingen & MPI-IS, Germany
- 11. Centre for Digital Music, Queen Mary University of London, UK
- 12. Department of Computer Science, University of Sheffield, UK
- 13. Dartmouth College, NH, USA
- 14. Beijing Academy of Artificial Intelligence, China
Description
The deep learning community has witnessed an exponentially growing interest in self-supervised learning (SSL). However, it still remains unexplored how to build a framework for learning useful representations of raw music waveforms in a self-supervised manner. In this work, we design Music2Vec, a framework exploring different SSL algorithmic components and tricks for music audio recordings. Our model achieves comparable results to the state-of-the-art (SOTA) music SSL model Jukebox, despite being significantly smaller with less than 2% of parameters of the latter. The model will be released on Huggingface.(https://huggingface.co/m-a-p/music2vec-v1)
The paper has been published at ISMIR LBD 2022. We only used 1k/130k hours of data to train the ISMIR LBD demo and will further scale up to get better performance.
Files
mhkcfwvzfptvydrkzjgdvsyhgzsdgnrb.zip
Files
(400.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:c5c8c6748ea8133aab38e5ddbf89decd
|
201.0 kB | Preview Download |
|
md5:40ac856587dc7ec535610b844974575d
|
199.6 kB | Preview Download |