BYRM: Brazilian YouTube Regional Music Dataset
Description
The Brazilian YouTube Regional Music (BYRM) dataset was created to support research on the automatic classification of Brazilian regional music genres using machine learning models and other computational approaches. It includes tracks from ten culturally diverse Brazilian genres: axé, rock brasileiro, toada, carimbó, samba, pagode, xote gaúcho, vaneira, sertanejo, and forró.
Due to copyright restrictions, the original audio tracks are not included in this release. Instead, this dataset provides:
-
BYRM_specs_v1.zip: Mel-spectrogram images (PNG) extracted from 3s, 5s, and 10s segments within different temporal excerpts of each song. Data is organized by genre, excerpt (e.g., 0–30s, 90–120s), and partition (train/val/test).
-
BYRM_features_v1.zip: Acoustic feature vectors extracted using the Librosa library, including MFCC, chroma, spectral centroid, rolloff, zero-crossing rate, bandwidth, and tempo. Each CSV file corresponds to a segment configuration.
-
metadata_csv: A set of 10 CSV files containing metadata for the original YouTube tracks used to construct the dataset. Each file provides information such as video title, YouTube ID, and channel name.
This dataset was developed as part of the Master's dissertation titled:
"Aprendizagem Profunda com Redes de Transformadores de Visão Computacional para Reconhecimento de Gêneros Musicais"
by Victória de Souza Guimarães, supervised by Prof. Dr. Rosiane de Freitas, Universidade Federal do Amazonas (UFAM), 2025.
Research based on this dataset has resulted in the following academic contributions:
-
Accepted: “Segment-based evaluation of music genre classification models with the BYRM Dataset” – KDMiLe 2025
- Accepted: “Understanding genre similarity in Brazilian music through Vision Transformer embeddings” - SBCM 2025
Acknowledgments
This research was supported by FAPEAM (Fundação de Amparo à Pesquisa do Estado do Amazonas), CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico), and CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior). We gratefully acknowledge the support of these institutions for making this research possible.
Citation
If you use this dataset in your work, please cite it as:
Guimarães, Victória de Souza; Freitas, Rosiane de. BYRM: Brazilian YouTube Regional Music Dataset. Zenodo, 2025. Available at: https://doi.org/10.5281/zenodo.16617888.
BibTeX
@dataset{guimaraes2025byrm,
author = {Guimarães, Victória de Souza and Freitas, Rosiane de},
title = {BYRM: Brazilian YouTube Regional Music Dataset},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.16617888},
url = {https://doi.org/10.5281/zenodo.16617888}
}