Published August 5, 2019 | Version v1
Dataset Open

Text files from Gutenberg database

Authors/Creators

  • 1. Tampere University

Description

Text files of different size and structure. More precisely, we selected random data from the Gutenberg dataset. This artefact contains five different datasets with random text files (i.e. e-books in .txt format) from the Gutenberg database. The datasets that we selected ranged from text files with a total size of 184MB to a set of text files with a total size of 1.7GB. More precisely, the following datasets can be found in this package: 1. 184MB 2. 357MB 3. 670MB 4. 1GB 5. 1.7GB In our case, we used this dataset to perform extensive experiments on regarding the performance of a Symmetric Searchable Encryption scheme. However, this dataset can be used to measure the performance of any algorithm that is parsing documents, extracting keywords, creates dictionaries etc.

Files

D1.7GB.zip

Files (1.4 GB)

Name Size Download all
md5:c4fb5421d358ab25d3a17f8c9fa48ff1
563.4 MB Preview Download
md5:b400e353aff56022035cbebc328878cb
68.1 MB Preview Download
md5:58493488f94d43ae114399014edc1ae4
356.1 MB Preview Download
md5:3ce8872ec280123aea826766a11d89f2
134.2 MB Preview Download
md5:6bb3d099581bebc1f13b8dca33592a80
251.2 MB Preview Download