Presentation Open Access

PoetryLab An Open Source Toolkit for the Analysis of Spanish Poetry Corpora

De La Rosa, Javier; Pérez Pozo, Álvaro; Ros, Salvador; González-Blanco, Elena

The transmission of text in poetic form is a quasi-universal aspect in the oral tradition of every culture. The study of the poetic features of text, especially their rhythmic structure when forming verses, pertains to the different traditions, whose scholars established the rules that might govern poetry. Within this context, the POSTDATA Project formalized a network of ontologies able to express any poetic expression and its analysis at the European level, enabling scholars all over Europe to interchange their data using Linked Open Data. However, varied research interests result in corpora that might not share the same facets of analysis. To alleviate this concern and foster the completeness of the interchanged corpora, our team set to build a software toolkit to assist in the analysis of poetry. This paper introduces PoetryLab, an extensible open-source toolkit for syllabification, scansion (extraction of stress patterns), enjambment detection (syntactical units split in two lines), rhyme detection, and historical named entity recognition for Spanish poetry. Our toolkit achieves the state of the art performance in the tasks for which reproducible alternatives exist. 

Files (893.5 kB)
Name Size
DH2020 - PoetryLab - presentation.pdf
893.5 kB Download
All versions This version
Views 3232
Downloads 1616
Data volume 14.3 MB14.3 MB
Unique views 2323
Unique downloads 1616


Cite as