Published September 27, 2022
| Version v1
Dataset
Open
hadoop-14TB-part1
Description
Five data nodes worth of logs from a larger 14 TB dataset of Hadoop logs. The logs were generated from three Hadoop clusters, each containing 48 data nodes, running workloads from the HiBench Benchmark Suite for a month. This dataset was first used in the evaluation of "CLP: Efficient and Scalable Search on Compressed Text Logs."
Files
Files
(21.8 GB)
Additional details
References
- S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In 26th International Conference on Data Engineering Workshops, ICDEW '10, pages 41–51. IEEE Computer Society, 2010.
- K. Rodrigues, Y. Luo, and D. Yuan. CLP: Efficient and Scalable Search on Compressed Text Logs. In Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, pp. 183–198.