Published July 30, 2025 | Version v1
Dataset Open

BYRM: Brazilian YouTube Regional Music Dataset

  • 1. ROR icon Universidade Federal do Amazonas
  • 2. Instituto de Computação (ICOMP)
  • 3. Programa de Pós-Graduação em Informática (PPGI)

Description

The Brazilian YouTube Regional Music (BYRM) dataset was created to support research on the automatic classification of Brazilian regional music genres using machine learning models and other computational approaches. It includes tracks from ten culturally diverse Brazilian genres: axé, rock brasileiro, toada, carimbó, samba, pagode, xote gaúcho, vaneira, sertanejo, and forró.

Due to copyright restrictions, the original audio tracks are not included in this release. Instead, this dataset provides:

  • BYRM_specs_v1.zip: Mel-spectrogram images (PNG) extracted from 3s, 5s, and 10s segments within different temporal excerpts of each song. Data is organized by genre, excerpt (e.g., 0–30s, 90–120s), and partition (train/val/test).

  • BYRM_features_v1.zip: Acoustic feature vectors extracted using the Librosa library, including MFCC, chroma, spectral centroid, rolloff, zero-crossing rate, bandwidth, and tempo. Each CSV file corresponds to a segment configuration.

  • metadata_csv: A set of 10 CSV files containing metadata for the original YouTube tracks used to construct the dataset. Each file provides information such as video title, YouTube ID, and channel name.

 

This dataset was developed as part of the Master's dissertation titled:

"Aprendizagem Profunda com Redes de Transformadores de Visão Computacional para Reconhecimento de Gêneros Musicais"
by Victória de Souza Guimarães, supervised by Prof. Dr. Rosiane de Freitas, Universidade Federal do Amazonas (UFAM), 2025.

Research based on this dataset has resulted in the following academic contributions:

  • Accepted: “Segment-based evaluation of music genre classification models with the BYRM Dataset” – KDMiLe 2025

  • Accepted: “Understanding genre similarity in Brazilian music through Vision Transformer embeddings” - SBCM 2025

 

Acknowledgments

This research was supported by FAPEAM (Fundação de Amparo à Pesquisa do Estado do Amazonas), CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico), and CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior). We gratefully acknowledge the support of these institutions for making this research possible.

 

Citation

If you use this dataset in your work, please cite it as:

Guimarães, Victória de Souza; Freitas, Rosiane de. BYRM: Brazilian YouTube Regional Music Dataset. Zenodo, 2025. Available at: https://doi.org/10.5281/zenodo.16617888.

BibTeX

@dataset{guimaraes2025byrm,
  author    = {Guimarães, Victória de Souza and Freitas, Rosiane de},
  title     = {BYRM: Brazilian YouTube Regional Music Dataset},
  year      = {2025},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.16617888},
  url       = {https://doi.org/10.5281/zenodo.16617888}
}

Files

BYRM_specs.zip

Files (20.0 GB)

Name Size Download all
md5:7a9055880a652a71eb701c09971d2752
155.0 MB Preview Download
md5:3ca22392e2e77da23116028e99831a28
36.1 kB Preview Download
md5:10de80040c1e2e5f7e7d298c6d1e6286
19.8 GB Preview Download