ELTE Song Lyrics Corpus
Description
The ELTE Song Lyrics Corpus is a database developed by The Research Group of Stylistics and the Department of Digital Humanities at Eötvös Loránd University. The song lyrics were automatically scraped from the website https://www.zeneszoveg.hu/. In addition to metadata, the corpus includes annotations of structural units, grammatical features, and poetic properties related to sound devices.
The folder level1 contains all the scraped songs, while the folder level1_hu includes only the Hungarian ones. The language detection module of fastText was used to filter out non-Hungarian lyrics. The level1 folder also contains files without textual content.
The level2_hu folder contains songs annotated with lemmas, parts of speech, and morphosyntactic features. For the annotation of grammatical features, the emtsv version of the e-magyar program was used with the output of Universal Dependencies labels. The folder level3_hu includes additional annotations of certain sound devices (rhyme pattern, rhyme pairs, rhythm, meter, alliterations, and phonological features of words). When annotating sound devices, the programs developed for the ELTE Poetry Corpus was used.
Format: TEI XML
Size: 95 594 song lyrics, 18.3 million tokens
Funding: The corpus building was supported by the project No. K-137659 (Corpus-based cognitive poetic research on person marking constructions) of the National Research, Development and Innovation Office of Hungary.
Copyright:
- The content of the corpus is not public and cannot be disclosed due to copyright protection.
- The corpus is for research purposes only and can only be used by those who have received permission from the head of the Stylistic Research Group or the head of the Department of Digital Humanities at Eötvös Loránd University.