Creation and Analysis of the Yugoslav Rock Song Lyrics Corpus from 1945 to 2003

doi:10.5281/zenodo.5717363

Published February 9, 2019 | Version v1

Thesis Open

Creation and Analysis of the Yugoslav Rock Song Lyrics Corpus from 1945 to 2003

Petkovic Ljudmila¹

1. University of Belgrade

The thesis analyzes, from the theoretical and practical perspective, the creation and processing of corpus of rock songs’ lyrics originating from the former Yugoslavia in the period 1945-2003. The lyrics are obtained from the LyricWiki website using the Python library lyricsmaster in the web scraping process. The collected texts are then merged into a single XML file and automatically annotated with the yattag Python tool. Afterwards, the data preprocessing was conducted at the formal and content level. Furthermore, the XML document is transformed into XHTML format applying XSLT processor, in order to generate basic corpus data. The diacritic restoration process with the “Slovo Majstor” application and morphological electronic dictionaries of Serbian language in the LeXimir software package, is also automated. The text mining process encompassed retrieving socio-political and patriotic topics using NLTK library in Python, while romantic and other topics were visualized using the TreeCloud and WordItOut software. The similarity between authors represented in the corpus was measured using stylo package in the programming language R. Finally, an overview of the today’s most relevant programming libraries in the field of natural language processing is provided, which, at the same time, serves as a guideline for the future work.

Files

Création et analyse du corpus de textes de chansons rock yougoslaves de 1945 à 2003.pdf

Files (1.4 MB)

Name	Size	Download all
Création et analyse du corpus de textes de chansons rock yougoslaves de 1945 à 2003.pdf md5:c6a96c69970d0bfb10c8cc3daf2ec29a	1.4 MB	Preview Download

	All versions	This version
Views	82	82
Downloads	53	53
Data volume	81.2 MB	81.2 MB

Creation and Analysis of the Yugoslav Rock Song Lyrics Corpus from 1945 to 2003

Creators

Description

Files

Création et analyse du corpus de textes de chansons rock yougoslaves de 1945 à 2003.pdf

Files (1.4 MB)