Dataset Open Access

Webis-Simple-Sentences-17 Corpus

Kiesel, Johannes; Stein, Benno; Lucks, Stefan


JSON-LD (schema.org) Export

{
  "description": "<p>A corpus of 471,085,690 English sentences extracted from the ClueWeb12 Web Crawl. The sentences were sampled from a larger corpus to achieve a level of sentence complexity similar to the one of sentences that humans make up as a memory aid for remembering passwords. Sentence complexity was determined by syllables per word.</p>\n\n<p>The corpus is split in training and test set as it is used in the associated publication.&nbsp; The test set is extracted from part 00 of the ClueWeb12, while the training set is extracted from the other parts.</p>\n\n<p>More information on the corpus can be found on the corpus web page at our university (listed under documented by).</p>", 
  "license": "https://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "Bauhaus-Universit\u00e4t Weimar", 
      "@id": "https://orcid.org/0000-0002-1617-6508", 
      "@type": "Person", 
      "name": "Kiesel, Johannes"
    }, 
    {
      "affiliation": "Bauhaus-Universit\u00e4t Weimar", 
      "@id": "https://orcid.org/0000-0001-9033-2217", 
      "@type": "Person", 
      "name": "Stein, Benno"
    }, 
    {
      "affiliation": "Bauhaus-Universit\u00e4t Weimar", 
      "@type": "Person", 
      "name": "Lucks, Stefan"
    }
  ], 
  "url": "https://zenodo.org/record/205950", 
  "datePublished": "2017-02-27", 
  "@type": "Dataset", 
  "keywords": [
    "Web Crawl", 
    "Sentence", 
    "Readability", 
    "Password", 
    "Password Mnemonic", 
    "Mnemonic", 
    "Web"
  ], 
  "@context": "https://schema.org/", 
  "distribution": [
    {
      "contentUrl": "https://zenodo.org/api/files/e7ec33e3-6d4e-43cc-b199-c2d2a3a9e0f4/webis-simple-sentences-17-corpus-test.txt.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/e7ec33e3-6d4e-43cc-b199-c2d2a3a9e0f4/webis-simple-sentences-17-corpus-training.txt.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }
  ], 
  "identifier": "https://doi.org/10.5281/zenodo.205950", 
  "@id": "https://doi.org/10.5281/zenodo.205950", 
  "workFeatured": {
    "url": "http://www.internetsociety.org/events/ndss-symposium/ndss-symposium-2017", 
    "alternateName": "NDSS 2017", 
    "location": "San Diego, California.", 
    "@type": "Event", 
    "name": "Network and Distributed System Security Symposium 2017"
  }, 
  "name": "Webis-Simple-Sentences-17 Corpus"
}
579
324
views
downloads
All versions This version
Views 579580
Downloads 324324
Data volume 2.0 TB2.0 TB
Unique views 529530
Unique downloads 208208

Share

Cite as