Dataset Open Access

# Webis-Simple-Sentences-17 Corpus

Kiesel, Johannes; Stein, Benno; Lucks, Stefan

### Citation Style Language JSON Export

{
"publisher": "Zenodo",
"DOI": "10.5281/zenodo.205950",
"title": "Webis-Simple-Sentences-17 Corpus",
"issued": {
"date-parts": [
[
2017,
2,
27
]
]
},
"abstract": "<p>A corpus of 471,085,690 English sentences extracted from the ClueWeb12 Web Crawl. The sentences were sampled from a larger corpus to achieve a level of sentence complexity similar to the one of sentences that humans make up as a memory aid for remembering passwords. Sentence complexity was determined by syllables per word.</p>\n\n<p>The corpus is split in training and test set as it is used in the associated publication.&nbsp; The test set is extracted from part 00 of the ClueWeb12, while the training set is extracted from the other parts.</p>\n\n<p>More information on the corpus can be found on the corpus web page at our university (listed under documented by).</p>",
"author": [
{
"family": "Kiesel, Johannes"
},
{
"family": "Stein, Benno"
},
{
"family": "Lucks, Stefan"
}
],
"id": "205950",
"event-place": "San Diego, California.",
"type": "dataset",
"event": "Network and Distributed System Security Symposium 2017 (NDSS 2017)"
}
579
324
views