Published August 5, 2019
| Version v1
Dataset
Open
Text files from Gutenberg database
Description
Text files of different size and structure. More precisely, we selected random data from the Gutenberg dataset.
This artefact contains five different datasets with random text files (i.e. e-books in .txt format) from the Gutenberg database. The datasets that we selected ranged from text files with a total size of 184MB to a set of text files with a total size of 1.7GB.
More precisely, the following datasets can be found in this package:
1. 184MB
2. 357MB
3. 670MB
4. 1GB
5. 1.7GB
In our case, we used this dataset to perform extensive experiments on regarding the performance of a Symmetric Searchable Encryption scheme. However, this dataset can be used to measure the performance of any algorithm that is parsing documents, extracting keywords, creates dictionaries etc.
Files
D1.7GB.zip
Files
(1.4 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:c4fb5421d358ab25d3a17f8c9fa48ff1
|
563.4 MB | Preview Download |
|
md5:b400e353aff56022035cbebc328878cb
|
68.1 MB | Preview Download |
|
md5:58493488f94d43ae114399014edc1ae4
|
356.1 MB | Preview Download |
|
md5:3ce8872ec280123aea826766a11d89f2
|
134.2 MB | Preview Download |
|
md5:6bb3d099581bebc1f13b8dca33592a80
|
251.2 MB | Preview Download |