Corpus de novelas hispanoamericanas del siglo XIX (conha19) - TXM corpus
Creators
Description
This is a version of the corpus conha19 which has been prepared for the text analysis tool TXM (see http://textometrie.ens-lyon.fr/).
Conha19 consists of 256 novels written by Argentine, Cuban, and Mexican authors or published in the respective countries between 1830 and 1910. Of these novels, 234 are published in this TXM corpus, as they are in the public domain. The corpus has been prepared primarily to allow for the analysis of subgenres, especially thematic subgenres (historical novel, sentimental novel, etc.) and literary currents (such as romantic, realist, and naturalistic novels), but can be reused for other purposes (e.g. analysis of the style of individual authors, by country, or period). For details about the contents of the corpus see the metadata file (metadata.csv) below.
Conha19 was created for the dissertation "Genre Analysis and Corpus Design: 19th Century Spanish American Novels (1830-1910)", written by Ulrike Henny-Krahmer, which in turn was realized as part of the junior research group "Computational Literary Genres Stylistics" (CLiGS), a project funded by the German Federal Ministry of Education and Research (BMBF) and hosted at the University of Würzburg between 2015 and 2020.
For the version offered here, the corpus has been saved in a binary format (the file conha19.txm) that can be directly loaded into the text analysis tool TXM. It includes linguistic annotations which have been generated with the tool FreeLing (v4.0, see http://nlp.lsi.upc.edu/freeling) and the WordNet API of NLTK (see https://www.nltk.org/howto/wordnet.html).
Files
metadata.csv
Files
(2.6 GB)
Name | Size | Download all |
---|---|---|
md5:e270f1b17492cce4b0c70c3adddf1f50
|
2.6 GB | Download |
md5:898e05d612f8b09ec9ab9979d57f2d97
|
35.5 kB | Preview Download |