Published September 27, 2022 | Version v1
Dataset Open

hadoop-14TB-part1

  • 1. University of Toronto & YScope Inc.

Description

Five data nodes worth of logs from a larger 14 TB dataset of Hadoop logs. The logs were generated from three Hadoop clusters, each containing 48 data nodes, running workloads from the HiBench Benchmark Suite for a month. This dataset was first used in the evaluation of "CLP: Efficient and Scalable Search on Compressed Text Logs."

Files

Files (21.8 GB)

Name Size Download all
md5:0843aef28388c3695947d6e5c78c87dc
4.3 GB Download
md5:4be10819a06383e97cdcfc1b1190510f
4.4 GB Download
md5:3c723706422219484d29932c6676b414
4.4 GB Download
md5:d655e4ff9dd7b88224dca004220e72c5
4.3 GB Download
md5:d6ddf7f16761fe3d7db8808342e95e23
4.4 GB Download

Additional details

References

  • S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In 26th International Conference on Data Engineering Workshops, ICDEW '10, pages 41–51. IEEE Computer Society, 2010.
  • K. Rodrigues, Y. Luo, and D. Yuan. CLP: Efficient and Scalable Search on Compressed Text Logs. In Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, pp. 183–198.