Published April 28, 2026 | Version v1
Computational notebook Open

PDMX Explorer: A Reproducible Framework for the Digital Humanities Analysis of Symbolic Music Datasets

  • 1. ROR icon Sorbonne Université

Description

The PDMX Explorer is a reproducible research framework designed for the critical exploration of large-scale symbolic music datasets, with a specific focus on the PDMX (Public Domain MusicXML) corpus.

Developed within a digital humanities perspective, this notebook enables the transformation of a machine-learning dataset into an interpretable musicological corpus. It provides tools for exploratory data analysis (EDA), metadata inspection, composer normalization, period inference, and the study of relationships between platform-based reception metrics and musical features.

The framework is designed not only for technical reproducibility but also for methodological transparency. It explicitly documents the processes through which raw data is filtered, structured, and interpreted, in line with a corpus criticism approach.

Key functionalities include:

  • Metadata completeness analysis and visualization
  • Composer name normalization and distribution analysis
  • Historical period inference via external authority mappings
  • Correlation analysis between popularity metrics and musical features
  • Deduplication diagnostics and reporting
  • Construction of a curated “PDMX musicology core” corpus

This repository includes:

  • A fully documented Jupyter notebook (PDMX Explorer)
  • Processed metadata files and curated subsets
  • Figures used in the associated publication
  • Configuration and reproducibility files
  • Methodological documentation describing corpus construction choices

The PDMX Explorer is intended for researchers in digital humanities, computational musicology, and AI, offering a transparent and extensible framework for analyzing symbolic music data as both computational and cultural artifacts.

Files

PDMX_explorer.zip

Files (52.3 MB)

Name Size Download all
md5:e044d563214981d4cda10c90de07a3ec
52.3 MB Preview Download

Additional details

Dates

Created
2026-03-28

Software

Programming language
Python