Dataset Open Access

Textual features and metadata for DBNL novels 1800-2000

van Cranenburgh, Andreas; Veldhoen, Sara; De Gruijter, Michel

This dataset contains a corpus of 1346 novels from DBNL. Included are metadata, word counts, and syntactic features for the novels. The metadata includes variables related to canonicity: library information, secondary references, Wikipedia mentions, etc.

The titles have been selected using the following criteria:

  • Novels and novellas
  • Originally written in Dutch
  • First published 1800-2000
  • TEI from titles available on https://www.DBNL.org

Acknowledgements: Information from libraries was contributed by Trudie Stoutjesdijk and Eddie de Kok from Data Warehouse.

Files (3.1 GB)
Name Size
bigrams.csv.gz
md5:be533b311ba582e3a203134c3861f918
7.0 MB Download
dbnl-skipgram-embeddings.bin
md5:8112c7d96c5ced9e61388e42f2daf544
1.9 GB Download
dbnl-skipgram-embeddings.vec
md5:c2c6395aaaf3e8cd73dc2d4628fbf62a
361.5 MB Download
dbnlsecrefs.csv.gz
md5:9dc7347c7d351c9afd66c80887fe2e95
3.0 MB Download
langdetect.tsv
md5:25f1dac253afb04ed72753e0326f4c6b
80.6 kB Download
metadata.tsv
md5:affce8e245102b94727da9921b7afa1c
275.1 kB Download
pagelevelwordcounts.tar
md5:6e1c03e398ea9d1f65c1e5de413b7b45
720.5 MB Download
README.md
md5:7de9df719a3a9271436aae74f4602d01
3.2 kB Download
sentiment.tar
md5:c87f40801ea67c8a3bb3908a113caf8c
17.8 MB Download
similarity.tsv
md5:740b3432a0567312dea20c73098efb7a
70.2 kB Download
syntacticfeatures.tsv
md5:20cf1e32bbbaebdb01e632b8bef39c74
1.9 MB Download
topicmodel.tar
md5:cdeb78d73304d4abd12e21c53233678e
68.7 MB Download
trainembeddings.sh
md5:263072523f99a910740e21002f139327
181 Bytes Download
unigrams.csv
md5:bb01625c5503d2c3ba5cc6877ff44352
3.5 MB Download
68
17
views
downloads
All versions This version
Views 6868
Downloads 1717
Data volume 371.2 MB371.2 MB
Unique views 5656
Unique downloads 1111

Share

Cite as