Dataset Open Access

Standardized Project Gutenberg Corpus

Martin Gerlach; Francesc Font-Clos


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.2422561", 
  "title": "Standardized Project Gutenberg Corpus", 
  "issued": {
    "date-parts": [
      [
        2018, 
        12, 
        19
      ]
    ]
  }, 
  "abstract": "<p><strong>Standardized Project Gutenberg Corpus</strong><br>\nversion: SPGC-2018-07-18<br>\nnumber of books: 55905<br>\nuncompressed size: 3GB (counts) +&nbsp;18GB (tokens)</p>\n\n<p><strong>Publication</strong><br>\n<a href=\"https://arxiv.org/abs/1812.08092\">https://arxiv.org/abs/1812.08092</a><br>\n[ journal link ]</p>\n\n<p><strong>Project Site</strong><br>\n<a href=\"https://pgcorpus.github.io/\">https://pgcorpus.github.io/</a></p>\n\n<p><strong>Github</strong><br>\n<a href=\"https://github.com/pgcorpus/gutenberg\">https://github.com/pgcorpus/gutenberg</a></p>", 
  "author": [
    {
      "family": "Martin Gerlach"
    }, 
    {
      "family": "Francesc Font-Clos"
    }
  ], 
  "version": "SPGC-2018-07-18", 
  "type": "dataset", 
  "id": "2422561"
}
1,698
530
views
downloads
All versions This version
Views 1,6981,698
Downloads 530530
Data volume 899.4 GB899.4 GB
Unique views 1,5941,594
Unique downloads 380380

Share

Cite as