RCAEval: A Benchmark for Root Cause Analysis of Microservice Systems
Description
RCAEval: A Benchmark for Root Cause Analysis of Microservice Systems
RCAEval benchmark includes three datasets: RE1, RE2, and RE3, designed to comprehensively support benchmarking RCA in microservice systems. Together, our three datasets feature 735 failure cases collected from three microservice systems (Online Boutique, Sock Shop, and Train Ticket) and including 11 fault types. Each failure case also includes annotated root cause service and root cause indicator (e.g., specific metric or log indicating the root cause). The statistics of the datasets are presented below.
| File Name | Dataset | Systems | Fault Types | Cases | Metrics | Logs (millions) | Traces (millions) |
|---|---|---|---|---|---|---|---|
| RE1.zip | RE1 | 3 | 3 Resource, 2 Network | 375 | 49-212 | N/A | N/A |
| RE2.zip | RE2 | 3 | 4 Resource, 2 Network | 270 | 77-376 | 8.6-26.9 | 39.6-76.7 |
| RE3.zip | RE3 | 3 | 5 Code-level | 90 | 68-322 | 1.7-2.7 | 4.5-4.7 |
RE1 Dataset. The RE1 dataset, introduced in our prior work on metric-based RCA [14], contains 375 failure cases collected from three microservice systems (125 cases per system). These cases combine five fault types across five services, and five repetitions per fault-service pair. The RE1 dataset exclusively contains metrics data, supporting the development of metric-based RCA methods. The fault types in RE1 include CPU, MEM, DISK, DELAY, LOSS. The number of metrics ranges from 49 to 212, depending on the system size, with smaller systems (e.g., Online Boutique, Sock Shop) having fewer metrics compared to larger system (Train Ticket). This dataset does not include logs or traces.
RE2 Dataset. The RE2 dataset, newly collected for this study, supports the development of multi-source RCA methods. It includes 270 failure cases collected from three microservice systems (90 cases per system), combining six fault types across five services, and three repetitions per fault-service pair. RE2 provides multi-source telemetry data, including metrics, logs, and traces. The number of metrics ranges from 77 to 327 per failure case. Each system generates a substantial volume of logs from (8.6 to 26.9 million lines), and traces (39.6 to 76.7 million traces). The fault types include those in RE1 and an additional SOCKET fault.
RE3 Dataset. The RE3 dataset, also newly collected, focuses on supporting multi-source RCA methods with the ability to diagnose code-level faults. It has 90 failure cases (30 per system), involving code-level faults. The fault types in RE3 are F1, F2, F3, F4, F5. Like RE2, RE3 includes multi-source telemetry data (metrics, logs, and traces). This dataset emphasizes diagnosing code-level faults through telemetry data, e.g., leveraging stack traces in logs or response code in traces to pinpoint root causes, making it invaluable for advancing multi-source RCA methods.
The benchmark evaluation framework supports these datasets are in this GitHub repository: https://github.com/phamquiluan/RCAEval
Files
RE1.zip
Additional details
Software
- Repository URL
- https://github.com/phamquiluan/RCAEval
- Development Status
- Active