Dataset Open Access

Standardized Project Gutenberg Corpus

Martin Gerlach; Francesc Font-Clos


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Martin Gerlach</dc:creator>
  <dc:creator>Francesc Font-Clos</dc:creator>
  <dc:date>2018-12-19</dc:date>
  <dc:description>Standardized Project Gutenberg Corpus
version: SPGC-2018-07-18
number of books: 55905
uncompressed size: 3GB (counts) + 18GB (tokens)

Publication
https://arxiv.org/abs/1812.08092
[ journal link ]

Project Site
https://pgcorpus.github.io/

Github
https://github.com/pgcorpus/gutenberg</dc:description>
  <dc:identifier>https://zenodo.org/record/2422561</dc:identifier>
  <dc:identifier>10.5281/zenodo.2422561</dc:identifier>
  <dc:identifier>oai:zenodo.org:2422561</dc:identifier>
  <dc:relation>doi:10.5281/zenodo.2422560</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>http://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
  <dc:title>Standardized Project Gutenberg Corpus</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>
1,705
540
views
downloads
All versions This version
Views 1,7051,705
Downloads 540540
Data volume 901.0 GB901.0 GB
Unique views 1,5991,599
Unique downloads 389389

Share

Cite as