Published June 5, 2017
| Version v1
Dataset
Open
Twelve Novels by Arthur Conan Doyle (TXM corpus)
Creators
Description
This is a sample dataset of twelve novels written by Arthur Conan Doyle (all in the public domain). The novels have been prepared for use with the TXM text analysis software, including lemmatisation and part-of-speech tagging using TreeTagger. The collection contains twelve novels belonging to different subgenres: detective fiction (Sherlock Holmes novels), adventure novels, historical novels, horror novels and other novels. See the metadata file (metadata.csv) for details. The corpus file (doyle.txm) is a binary format that can be directly loaded into TXM (see http://textometrie.ens-lyon.fr/).
Files
metadata.csv
Files
(82.9 MB)
Name | Size | Download all |
---|---|---|
md5:a69ae428b46e319efecb0842dfcea332
|
82.9 MB | Download |
md5:f6cffa048d866ec6fc38524d2a5be242
|
1.2 kB | Preview Download |
Additional details
Related works
- Is cited by
- 10.5281/zenodo.10769 (DOI)