Published November 22, 2021 | Version v1.0.0
Dataset Open

Poesi.as dataset

Description

Collection of poems, mostly Spanish, from the 21th century and before

Some stats:

  • Number of poems: 25.187
  • Number of words: 7.918.679

Two jsons are provided:

An additional CSV file, authors.csv, provides reconciled information for authors of the 20th Century and below. Identifiers (VIAF, BnF, BNE, LoC, ISNI), dates of birth and death, and gender, are also added as they appear in Wikidata.

This repo is a dump of the website www.poesi.as, we do not own the rights of any of the works pulished here.

For any violations or infringement of copyright, take proper action within the scope of the original website.

Public Domain

The script extract.py generates a public domain corpus in JSON extracted from the corpus in poesi.as. The number of years since the death of an author needed for a work to be considered in the public domain can be specified using -y YEARS (--years YEARS). Defaults to 80 as per Spanish copyright laws. ` $ python extract.py > public_domain.json

Files

linhd-postdata/poesi.as-v1.0.0.zip

Files (33.3 MB)

Name Size Download all
md5:5e9506a0e9a8178d093ca245fb8160bf
33.3 MB Preview Download

Additional details

Funding

European Commission
POSTDATA - Poetry Standardization and Linked Open Data 679528