Cross-cultural music corpus: The Expanded Natural History of Song Discography

Mila Bertolo; Martynas Snarskis; Kyritsis, Thanos; Yurdum, Lidya; Bainbridge, Constance; Atwood, S.; Hilton, Courtney; Keomurjian, Anya; Lee, Judy S.; Mackiel, Alexander; Mak, Vanessa; Shin, Mijoo; Bitran, Alma; Shilton, Dor; Delasanta, Lana; Do, Hang (Heather); Lang, Jenna; Irani, Tenaaz; Kangatharan, Jayanthiny; Lafleur, Kevin; Malko, Nashua; Atkinson, Quentin; Manvir Singh; Samuel Mehr

doi:10.5281/zenodo.15725182

Published August 8, 2023 | Version v5

Video/Audio Open

Cross-cultural music corpus: The Expanded Natural History of Song Discography

1. McGill University
2. University of Auckland
3. The University of Auckland
4. University of Amsterdam
5. University of California, Los Angeles
6. Princeton University
7. University of Chicago
8. University of British Columbia
9. Harvard University
10. Rutgers, The State University of New Jersey
11. Tel Aviv University
12. University of Connecticut
13. City University of New York
14. University of Ottowa
15. UC Davis

This repository hosts the Expanded Natural History of Song Discography. It contains 1007 audio recordings of vocal music gathered from many human societies, each annotated with a world region, language, and behavioural context.

Each song file contains a 10-second excerpt of the source audio, selected at random from only portions of the recording that contain an audible singer. Given the short form of each excerpt, and the intended use of these files only for research purposes, they have been made available under Fair Use.

NHS2-songs.zip contains the audio files, volume-matched and with 1s fade in/out added, in MP3 format. These can be analysed as-is or used in experiments.

NHS2-metadata.csv contains annotations, where each row corresponds to a song. The four columns include song, which includes a unique identifier for each song in the format `NHS2-XXXX.mp3`; region, which indicates an approximate geographical location where the song was recorded, using Human Relations Area Files categories (see https://ehrafworldcultures.yale.edu); glottocode, which indicates the language in which the song is produced (see https://glottolog.org); and type, which indicates the behavioural context in which the song was produced, from a set of 10 categories (dance, healing, love, lullaby, play, procession, mourning, work, story, and praise).

For assistance with the corpus, contact Martynas Snarskis (martysnarskis@gmail.com), Mila Bertolo (mila.bertolo@mail.mcgill.ca), and Samuel Mehr (mehr@hey.com).

Further information about the construction of this corpus will be made available in a forthcoming paper; we will update this Zenodo archive when the paper is publicly available.

Notes

Version 5 updates the audio selection for one of the songs (NHS2-E2SX), which previously did not include vocals

Files

NHS2-metadata.csv

Files (235.8 MB)

Name	Size	Download all
NHS2-metadata.csv md5:31f4e51f88b25b05a0bc940bb1a376a8	41.9 kB	Preview Download
NHS2-songs.zip md5:82c2750225088bf7903ed520a63896f1	235.8 MB	Preview Download

	All versions	This version
Views	691	61
Downloads	475	55
Data volume	20.0 GB	1.2 GB

Cross-cultural music corpus: The Expanded Natural History of Song Discography

Creators

Description

Notes

Files

NHS2-metadata.csv

Files (235.8 MB)