Published September 4, 2019
| Version v1
Dataset
Open
Near Duplicate Study Crawls
Creators
Description
Contains 3 parts
GroundTruths contain the crawls used for creating SubjectSet SS.db
RQ3Crawls contains all 5 minute and 30 minute crawls for all subjects used for RQ3
DS_Crawls contains 1065 crawls that was used to create dataset DS.db
All the Crawls used for Near Duplicate Study. Each Crawl is done by crawljax with Google Chrome latest browser. Configuration includes the State Abstraction Function (SAF) used, threshold for the SAF and the time allotted for the crawl. The name of the folder contains all three configuration parameters. They can also be found from result.json as well as index.html.
Files
Files
(36.9 GB)
Name | Size | Download all |
---|---|---|
md5:1b8da0b6953620c64da799574e24ce38
|
36.9 GB | Download |