Published June 5, 2017 | Version v1
Dataset Open

Twelve Novels by Arthur Conan Doyle (TXM corpus)

Contributors

Data curator:

  • 1. University of Würzburg

Description

This is a sample dataset of twelve novels written by Arthur Conan Doyle (all in the public domain). The novels have been prepared for use with the TXM text analysis software, including lemmatisation and part-of-speech tagging using TreeTagger. The collection contains twelve novels belonging to different subgenres: detective fiction (Sherlock Holmes novels), adventure novels, historical novels, horror novels and other novels. See the metadata file (metadata.csv) for details. The corpus file (doyle.txm) is a binary format that can be directly loaded into TXM (see http://textometrie.ens-lyon.fr/).

Files

metadata.csv

Files (82.9 MB)

Name Size Download all
md5:a69ae428b46e319efecb0842dfcea332
82.9 MB Download
md5:f6cffa048d866ec6fc38524d2a5be242
1.2 kB Preview Download

Additional details

Related works

Is cited by
10.5281/zenodo.10769 (DOI)