hadoop-14TB-part1

Kirk Rodrigues; Yu Luo; Ding Yuan

doi:10.5281/zenodo.7114847

Published September 27, 2022 | Version v1

Dataset Open

hadoop-14TB-part1

1. University of Toronto & YScope Inc.

Five data nodes worth of logs from a larger 14 TB dataset of Hadoop logs. The logs were generated from three Hadoop clusters, each containing 48 data nodes, running workloads from the HiBench Benchmark Suite for a month. This dataset was first used in the evaluation of "CLP: Efficient and Scalable Search on Compressed Text Logs."

Files

Files (21.8 GB)

Name	Size
hadoop-14TB-cluster1-worker1.tar md5:0843aef28388c3695947d6e5c78c87dc	4.3 GB	Download
hadoop-14TB-cluster1-worker2.tar md5:4be10819a06383e97cdcfc1b1190510f	4.4 GB	Download
hadoop-14TB-cluster1-worker3.tar md5:3c723706422219484d29932c6676b414	4.4 GB	Download
hadoop-14TB-cluster1-worker4.tar md5:d655e4ff9dd7b88224dca004220e72c5	4.3 GB	Download
hadoop-14TB-cluster1-worker5.tar md5:d6ddf7f16761fe3d7db8808342e95e23	4.4 GB	Download

Additional details

S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In 26th International Conference on Data Engineering Workshops, ICDEW '10, pages 41–51. IEEE Computer Society, 2010.
K. Rodrigues, Y. Luo, and D. Yuan. CLP: Efficient and Scalable Search on Compressed Text Logs. In Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, pp. 183–198.

	All versions	This version
Views	578	578
Downloads	580	580
Data volume	4.4 TB	4.4 TB

hadoop-14TB-part1

Authors/Creators

Description

Files

Files (21.8 GB)

Additional details

References