Published July 31, 2020 | Version 1.0.0
Dataset Open

The Con Espressione Game Dataset

  • 1. Austrian Research Institute for Artificial Intelligence
  • 2. Johannes Kepler University Linz
  • 3. University of Tartu


Con Espressione Game Dataset

A piece of music can be expressively performed, or interpreted, in a variety of ways. With the help of an online questionnaire, the Con Espressione Game, we collected some 1,500 descriptions of expressive character relating to 45 performances of 9 excerpts from classical piano pieces, played by different famous pianists. More specifically, listeners were asked to describe, using freely chosen words (preferably: adjectives), how they perceive the expressive character of the different performances. The aim of this research is to find the dimensions of musical expression (in Western classical piano music) that can be attributed to a performance, as perceived and described in natural language by listeners.

The Con Espressione Game was launched on the 3rd of April 2018.

Dataset structure

Listeners’ Descriptions of Expressive performance

  • piece_performer_data.csv: A comma separated file (CSV) containing information about the pieces in the dataset. Strings are delimited with ". The columns in this file are:
    1. music_id: An integer ID for each performance in the dataset.
    2. performer_name: (Last) name of the performer.
    3. piece_name: (Short) name of the piece.
    4. performance_name: Name of the the performance. All files in different modalities (alignments, MIDI, loudness features, etc) corresponding to a single performance will have the same name (but possibly different extensions).
    5. composer: Name of the composer of the piece.
    6. piece: Full name of the piece.
    7. album: Name of the album.
    8. performer_name_full: Full name of the performer.
    9. year_of_CD_issue: Year of the issue of the CD.
    10. track_number: Number of the track in the CD.
    11. length_of_excerpt_seconds: Length of the excerpt in seconds.
    12. start_of_excerpt_seconds: Start of the excerpt in its corresponding track (in seconds).
    13. end_of_excerpt_seconds: End of the excerpt in its corresponding track (in seconds).
  • con_espressione_game_answers.csv: This is the main file of the dataset which contains listener’s descriptions of expressive character. This CSV file contains the following columns:
    1. answer_id: An integer representing the ID of the answer. Each answer gets a unique ID.
    2. participant_id: An integer representing the ID of a participant. Answers with the same ID come from the same participant.
    3. music_id: An integer representing the ID of the performance. This is the same as the music_id in piece_performer_data.csv described above.
    4. answer: (cleaned/formatted) participant description. All answers have been written as lower-case, typos were corrected, spaces replaced by underscores (_) and individual terms are separated by commas. See cleanup_rules.txt for a more detailed description of how the answers were formatted.
    5. original_answer: Raw answers provided by the participants.
    6. timestamp: Timestamp of the answer.
    7. favorite: A boolean (0 or 1) indicating if this performance of the piece is the participant’s favorite.
    8. translated_to_english. Raw translation (from German, Russian, Spanish and Italian).
    9. performer. (Last) name of the performer. See piece_performer_data.csv described above.
    10. piece_name. (Short) name of the piece. See piece_performer_data.csv described above.
    11. performance_name. Name of the performance. See piece_performer_data.csv described above.
  • participant_profiles.csv. A CSV file containing musical background information of the participants. Empty cells mean that the participant did not provide an answer. This file contains the following columns:
    1. participant_id: An integer representing the ID of a participant.
    2. music_education_years: (Self reported) number of years of musical education of the participants
    3. listening_to_classical_music: Answers to the question “How often do you listen to classical music?”. The possible answers are:
      • 1: Never
      • 2: Very rarely
      • 3: Rarely
      • 4: Occasionally
      • 5: Frequently
      • 6: Very frequently
    4. registration_date: Date and time of registration of the participant.
    5. playing_piano: Answer to the question “Do you play the piano?”. The possible answers are
      • 1: No
      • 2: A little bit
      • 3: Quite well
      • 4: Very well
  • cleanup_rules.txt: Rules for cleaning/formatting the terms in the participant’s answers.

  • translations_GERMAN.txt: How the translations from German to English were made.


Related meta data is stored in the MetaData folder.

  • Alignments. This folders contains the manually-corrected score-to-performance alignments for each of the pieces in the dataset. Each of these alignments is a text file.
  • ApproximateMIDI. This folder contains reconstructed MIDI performances created from the alignments and the loudness curves. The onset time and offset times of the notes were determined from the alignment times and the MIDI velocity was computed from the loudness curves.
  • Match. This folder contains score-to-performance alignments in Matchfile format.
  • Scores_MuseScore. Manually encoded sheet music in MuseScore format (.mscz)
  • Scores_MusicXML. Sheet music in MusicXML format.
  • Scores_pdf. Images of the sheet music in pdf format.

Audio Features

Audio features computed from the audio files. These features are located in the AudioFeatures folder.

  • Loudness: Text files containing loudness curves in dB of the audio files. These curves were computed using code provided by Olivier Lartillot. Each of these files contains the following columns:
    • performance_time_(seconds): Performance time in seconds.
    • loudness_(db): Loudness curve in dB.
    • smooth_loudness_(db): Smoothed loudness curve.
  • Spectrograms. Numpy files (.npy) containing magnitude spectrograms (as Numpy arrays). The shape of each array is (149 frequency bands, number of frames of the performance). The spectrograms were computed from the audio files with the following parameters:
    • Sample rate (sr): 22050 samples per second
    • Window length: 2048
    • Frames per Second (fps): 31.3 fps
    • Hop size: sample_rate // fps = 704
    • Filterbank: log scaled filterbank with 24 bands per octave and min frequency 20 Hz

MIDI Performances

Since the dataset consists of commercial recordings, we cannot include the audio files in the dataset. We can, however, share the 2 synthesized MIDI performances used in the Con Espressione game (for Bach’s Prelude in C and the second movement of Mozart’s Sonata in C K 545) in mp3 format. These performances can be found in the MIDIPerformances folder.


Files (76.1 MB)

Name Size Download all
76.1 MB Preview Download

Additional details


Con Espressione – Getting at the Heart of Things: Towards Expressivity-aware Computer Systems in Music 670035
European Commission