Dataset Open Access

Webis-Simple-Sentences-17 Corpus

Kiesel, Johannes; Stein, Benno; Lucks, Stefan

Citation Style Language JSON Export

  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.205950", 
  "title": "Webis-Simple-Sentences-17 Corpus", 
  "issued": {
    "date-parts": [
  "abstract": "<p>A corpus of 471,085,690 English sentences extracted from the ClueWeb12 Web Crawl. The sentences were sampled from a larger corpus to achieve a level of sentence complexity similar to the one of sentences that humans make up as a memory aid for remembering passwords. Sentence complexity was determined by syllables per word.</p>\n\n<p>The corpus is split in training and test set as it is used in the associated publication.&nbsp; The test set is extracted from part 00 of the ClueWeb12, while the training set is extracted from the other parts.</p>\n\n<p>More information on the corpus can be found on the corpus web page at our university (listed under documented by).</p>", 
  "author": [
      "family": "Kiesel, Johannes"
      "family": "Stein, Benno"
      "family": "Lucks, Stefan"
  "id": "205950", 
  "event-place": "San Diego, California.", 
  "type": "dataset", 
  "event": "Network and Distributed System Security Symposium 2017 (NDSS 2017)"
All versions This version
Views 579580
Downloads 324324
Data volume 2.0 TB2.0 TB
Unique views 529530
Unique downloads 208208


Cite as