Comparison of Audio Encoders for Audio-Text Contrastive Learning Representations

Cárdenas Gracia, Sergio

doi:10.5281/zenodo.17304842

Published July 31, 2025 | Version v1

Thesis Open

Comparison of Audio Encoders for Audio-Text Contrastive Learning Representations

Cárdenas Gracia, Sergio

Contributors

Supervisor (2):

1. Universitat Pompeu Fabra

This project investigates contrastive learning techniques for aligning audio and text representations in the music domain, focusing on scenarios with limited data and
computational resources. We provide a comprehensive review of existing methods relevant to music-text contrastive learning. Two audio encoders, HTSAT and
MAEST, initialized with pretrained weights, are integrated with a frozen RoBERTa text encoder within the LAION-AI CLAP framework and fine-tuned on the MTGJamendo
dataset. Model performance is evaluated on three tasks: zero-shot genre classification on the GTZAN dataset, multi-label tag classification on the MagnaTagATune
dataset, and text-to-music retrieval on the Song Describer dataset. Results show that HTSAT generalizes better in low-data settings, while MAEST tends to overfit, highlighting the impact of encoder complexity in resource-constrained environments. Attempts to mitigate MAEST’s overfitting with weight decay and learning rate decay were unsuccessful. Additionally, the study highlights the critical role of data volume and batch size in contrastive learning effectiveness.

The source code for this work is publicly available at https://github.com/SerX610/smc-master-thesis.

Files

Sergio-Cardenas_SMC_2025_Master_Thesis.pdf

Files (3.1 MB)

Name	Size	Download all
Sergio-Cardenas_SMC_2025_Master_Thesis.pdf md5:e1ee2cdcac9596cc8edf1ef6976b5c27	3.1 MB	Preview Download

Additional details

Accepted: 2025-10-09

	All versions	This version
Views	72	72
Downloads	57	57
Data volume	217.7 MB	217.7 MB

Comparison of Audio Encoders for Audio-Text Contrastive Learning Representations

Authors/Creators

Contributors

Supervisor (2):

Description

Files

Sergio-Cardenas_SMC_2025_Master_Thesis.pdf

Files (3.1 MB)

Additional details

Dates