Dataset Open Access
Martin Gerlach; Francesc Font-Clos
Standardized Project Gutenberg Corpus
version: SPGC-2018-07-18
number of books: 55905
uncompressed size: 3GB (counts) + 18GB (tokens)
Publication
https://arxiv.org/abs/1812.08092
[ journal link ]
Project Site
https://pgcorpus.github.io/
Github
https://github.com/pgcorpus/gutenberg
Name | Size | |
---|---|---|
SPGC-counts-2018-07-18.zip
md5:bccfbdf00caa906d84344cf335cc96ee |
1.5 GB | Download |
SPGC-metadata-2018-07-18.csv
md5:a2d5f325f13846cbec2fd21d982b4ef4 |
10.0 MB | Download |
SPGC-tokens-2018-07-18.zip
md5:13e16ae2c8350a0b7407a8f7a51e8a7e |
6.4 GB | Download |
All versions | This version | |
---|---|---|
Views | 2,706 | 2,706 |
Downloads | 1,991 | 1,990 |
Data volume | 2.5 TB | 2.5 TB |
Unique views | 2,448 | 2,448 |
Unique downloads | 1,500 | 1,499 |