Dataset Open Access
van Cranenburgh, Andreas;
Veldhoen, Sara;
De Gruijter, Michel
This dataset contains a corpus of 1346 novels from DBNL. Included are metadata, word counts, and syntactic features for the novels. The metadata includes variables related to canonicity: library information, secondary references, Wikipedia mentions, etc.
The titles have been selected using the following criteria:
Acknowledgements: Information from libraries was contributed by Trudie Stoutjesdijk and Eddie de Kok from Data Warehouse.
Name | Size | |
---|---|---|
bigrams.csv.gz
md5:be533b311ba582e3a203134c3861f918 |
7.0 MB | Download |
dbnl-skipgram-embeddings.bin
md5:8112c7d96c5ced9e61388e42f2daf544 |
1.9 GB | Download |
dbnl-skipgram-embeddings.vec
md5:c2c6395aaaf3e8cd73dc2d4628fbf62a |
361.5 MB | Download |
dbnlsecrefs.csv.gz
md5:9dc7347c7d351c9afd66c80887fe2e95 |
3.0 MB | Download |
langdetect.tsv
md5:25f1dac253afb04ed72753e0326f4c6b |
80.6 kB | Download |
metadata.tsv
md5:affce8e245102b94727da9921b7afa1c |
275.1 kB | Download |
pagelevelwordcounts.tar
md5:6e1c03e398ea9d1f65c1e5de413b7b45 |
720.5 MB | Download |
README.md
md5:7de9df719a3a9271436aae74f4602d01 |
3.2 kB | Download |
sentiment.tar
md5:c87f40801ea67c8a3bb3908a113caf8c |
17.8 MB | Download |
similarity.tsv
md5:740b3432a0567312dea20c73098efb7a |
70.2 kB | Download |
syntacticfeatures.tsv
md5:20cf1e32bbbaebdb01e632b8bef39c74 |
1.9 MB | Download |
topicmodel.tar
md5:cdeb78d73304d4abd12e21c53233678e |
68.7 MB | Download |
trainembeddings.sh
md5:263072523f99a910740e21002f139327 |
181 Bytes | Download |
unigrams.csv
md5:bb01625c5503d2c3ba5cc6877ff44352 |
3.5 MB | Download |
All versions | This version | |
---|---|---|
Views | 212 | 212 |
Downloads | 50 | 50 |
Data volume | 4.3 GB | 4.3 GB |
Unique views | 180 | 180 |
Unique downloads | 35 | 35 |