RCAEval: A Benchmark for Root Cause Analysis of Microservice Systems

Pham, Luan

doi:10.5281/zenodo.14504481

Published December 17, 2024 | Version 0.0.1

Dataset Open

RCAEval: A Benchmark for Root Cause Analysis of Microservice Systems

Pham, Luan (Data manager)¹

1. RMIT University

RCAEval: A Benchmark for Root Cause Analysis of Microservice Systems

RCAEval benchmark includes three datasets: RE1, RE2, and RE3, designed to comprehensively support benchmarking RCA in microservice systems. Together, our three datasets feature 735 failure cases collected from three microservice systems (Online Boutique, Sock Shop, and Train Ticket) and including 11 fault types. Each failure case also includes annotated root cause service and root cause indicator (e.g., specific metric or log indicating the root cause). The statistics of the datasets are presented below.

File Name	Dataset	Systems	Fault Types	Cases	Metrics	Logs (millions)	Traces (millions)
RE1.zip	RE1	3	3 Resource, 2 Network	375	49-212	N/A	N/A
RE2.zip	RE2	3	4 Resource, 2 Network	270	77-376	8.6-26.9	39.6-76.7
RE3.zip	RE3	3	5 Code-level	90	68-322	1.7-2.7	4.5-4.7

RE1 Dataset. The RE1 dataset, introduced in our prior work on metric-based RCA [14], contains 375 failure cases collected from three microservice systems (125 cases per system). These cases combine five fault types across five services, and five repetitions per fault-service pair. The RE1 dataset exclusively contains metrics data, supporting the development of metric-based RCA methods. The fault types in RE1 include CPU, MEM, DISK, DELAY, LOSS. The number of metrics ranges from 49 to 212, depending on the system size, with smaller systems (e.g., Online Boutique, Sock Shop) having fewer metrics compared to larger system (Train Ticket). This dataset does not include logs or traces.

RE2 Dataset. The RE2 dataset, newly collected for this study, supports the development of multi-source RCA methods. It includes 270 failure cases collected from three microservice systems (90 cases per system), combining six fault types across five services, and three repetitions per fault-service pair. RE2 provides multi-source telemetry data, including metrics, logs, and traces. The number of metrics ranges from 77 to 327 per failure case. Each system generates a substantial volume of logs from (8.6 to 26.9 million lines), and traces (39.6 to 76.7 million traces). The fault types include those in RE1 and an additional SOCKET fault.

RE3 Dataset. The RE3 dataset, also newly collected, focuses on supporting multi-source RCA methods with the ability to diagnose code-level faults. It has 90 failure cases (30 per system), involving code-level faults. The fault types in RE3 are F1, F2, F3, F4, F5. Like RE2, RE3 includes multi-source telemetry data (metrics, logs, and traces). This dataset emphasizes diagnosing code-level faults through telemetry data, e.g., leveraging stack traces in logs or response code in traces to pinpoint root causes, making it invaluable for advancing multi-source RCA methods.

The benchmark evaluation framework supports these datasets are in this GitHub repository: https://github.com/phamquiluan/RCAEval

Files

RE1.zip

Files (5.1 GB)

Name	Size	Download all
RE1.zip md5:53376fb562ea030c49c5ff31f604323d	389.7 MB	Preview Download
RE2.zip md5:6f84a83c4fe1c0d7eb08e131775687bf	4.2 GB	Preview Download
RE3.zip md5:fd921c32149e4e29b2f3c0d342f89662	534.0 MB	Preview Download

Additional details

Repository URL: https://github.com/phamquiluan/RCAEval
Development Status: Active

	All versions	This version
Views	2,249	151
Downloads	5,254	495
Data volume	5.2 TB	617.4 GB

RCAEval: A Benchmark for Root Cause Analysis of Microservice Systems

Authors/Creators

Description

Files

RE1.zip

Files (5.1 GB)

Additional details

Software