Published April 10, 2022 | Version 1.0
Dataset Restricted

ThumbSet - A dataset of partial annotations for Automatic Piano Fingering

  • 1. ROR icon Pompeu Fabra University
  • 2. ROR icon Sogang University
  • 3. ROR icon Kyoto University

Description

Here we introduce ThumbSet, an open dataset created from the collection of MusicXML piano scores published as public domain on the MuseScore website, which include finger label annotations. We relied on music publishers who added partial or full annotations to support piano learning when creating this dataset. It is important to note that these annotations reflect a single editor's expertise and do not represent a global ground truth. The source of the annotations on the MuseScore website is not always clear. Some scores may have incorrectly engraved labels, while others may have finger annotations provided by non-expert users. Consequently, the dataset may contain a significant amount of noise, resulting in lower data quality compared to the PIG dataset.

ThumbSet is composed of 2523 music scores, as shown in Table 1. The genres, transcription quality, and fingering quality are highly heterogeneous, and the difficulty level of the pieces tends to lean towards the early years of music education compared to the PIG dataset. However, it is not possible to quantify all these claims with the existing metadata. We can assert that there are more finger labels annotated in the right hand (61%) than in the left hand (39%), although there are more pieces with only left-hand annotations (742) than with only right-hand annotations (153). The proportion of annotated fingers and the window lengths are similar in both hands.

To make ThumbSet available for research purposes, we sliced the data into several windows. Each window takes into account the context, other notes, and symbols, ranging from 32 to 64 notes and symbols around each symbol of interest, including other finger label annotations. Additionally, the excerpts are encoded in the PIG encoding, a text format proposed in a previous study, which does not allow reverting to the original score to protect the copyright of the pieces. We distribute ThumbSet as variable-length music windows in the PIG encoding format. This format contains information about pitch, time onset, time offset, and finger label annotations, if they exist for all notes. Access to the data is limited and available upon request through the Zenodo platform. We also provide links to all the MuseScore source pieces used in creating ThumbSet.

 

For citation and more information, please refer to the following Article:

@inproceedings{ramoneda2022automatic,
  title={Automatic Piano Fingering from Partially Annotated Scores using Autoregressive Neural Networks},
  author={Pedro Ramoneda and Dasaem Jeong and Eita Nakamura and Xavier Serra and Marius Miron},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia (MM '22)},
  year={2022},
  month={October 10--14},
  location={Lisboa, Portugal}
}

Disclaimer: Although the name may suggest that there are only thumb fingers annotated on the new dataset, it only indicates the fact that the most typical partial annotations are those suggesting finger crossing, and also that the authors like Tom Thumb story.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

Disclaimer


The authors and their institution affiliations bear no responsibility for the uses of the ThumbSet Dataset, or for
interpretations or inferences based on these uses.  Their institution affiliations accept no liability for
indirect, consequential, or incidental damages or losses arising from the use of the ThumbSet Dataset,
or from the unavailability of, or break in access to the Dataset for whatever reason.


The author's institutions do not accept any responsibility or liability for data or material
contained on third party sites that reference the information on ThumbSet Dataset or for the use
any person makes of such third party information.  The author's institutions do not monitor
this third party information makes no representations in relation to the quality or accuracy
of the information on third-party websites or Data Bases.

 

Use


The data is available for use and downloadable only for non-profit and academic research purposes.

 

Note: Please include, in the justification field, your academic affiliation and a brief description of your research topics and why you would like to use this dataset. If you do not include this information we can not approve your request.

You are currently not logged in. Do you have an account? Log in here