Dataset Open Access

Standardized Project Gutenberg Corpus

Martin Gerlach; Francesc Font-Clos


JSON-LD (schema.org) Export

{
  "description": "<p><strong>Standardized Project Gutenberg Corpus</strong><br>\nversion: SPGC-2018-07-18<br>\nnumber of books: 55905<br>\nuncompressed size: 3GB (counts) +&nbsp;18GB (tokens)</p>\n\n<p><strong>Publication</strong><br>\n<a href=\"https://arxiv.org/abs/1812.08092\">https://arxiv.org/abs/1812.08092</a><br>\n[ journal link ]</p>\n\n<p><strong>Project Site</strong><br>\n<a href=\"https://pgcorpus.github.io/\">https://pgcorpus.github.io/</a></p>\n\n<p><strong>Github</strong><br>\n<a href=\"https://github.com/pgcorpus/gutenberg\">https://github.com/pgcorpus/gutenberg</a></p>", 
  "license": "http://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "Department of Chemical and Biological Engineering, Northwestern University", 
      "@type": "Person", 
      "name": "Martin Gerlach"
    }, 
    {
      "affiliation": "Center for Complexity and Biosystems, Department of Physics, University of Milan", 
      "@type": "Person", 
      "name": "Francesc Font-Clos"
    }
  ], 
  "url": "https://zenodo.org/record/2422561", 
  "datePublished": "2018-12-19", 
  "version": "SPGC-2018-07-18", 
  "@context": "https://schema.org/", 
  "distribution": [
    {
      "contentUrl": "https://zenodo.org/api/files/0cbe787b-82b8-4e8b-a232-dabd70bb4e7d/SPGC-counts-2018-07-18.zip", 
      "@type": "DataDownload", 
      "fileFormat": "zip"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/0cbe787b-82b8-4e8b-a232-dabd70bb4e7d/SPGC-metadata-2018-07-18.csv", 
      "@type": "DataDownload", 
      "fileFormat": "csv"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/0cbe787b-82b8-4e8b-a232-dabd70bb4e7d/SPGC-tokens-2018-07-18.zip", 
      "@type": "DataDownload", 
      "fileFormat": "zip"
    }
  ], 
  "identifier": "https://doi.org/10.5281/zenodo.2422561", 
  "@id": "https://doi.org/10.5281/zenodo.2422561", 
  "@type": "Dataset", 
  "name": "Standardized Project Gutenberg Corpus"
}
1,576
276
views
downloads
All versions This version
Views 1,5761,576
Downloads 276276
Data volume 834.7 GB834.7 GB
Unique views 1,4881,488
Unique downloads 147147

Share

Cite as