Published August 12, 2024 | Version 0.0.1
Dataset Open

Dataset Artifact for paper "Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?"

  • 1. ROR icon RMIT University
  • 2. ROR icon Chongqing University

Description

Artifacts for the paper titled Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?.

This artifact repository contains 9 compressed folders, as follows: 

ID File Name Description
1 syn_circa.zip CIRCA10, and CIRCA50 datasets for Causal Discovery
2 syn_rcd.zip RCD10, and RCD50 datasets for Causal Discovery
3 syn_causil.zip CausIL10, and CausIL50 datasets for Causal Discovery
4 rca_circa.zip CIRCA10, and CIRCA50 datasets for RCA
5 rca_rcd.zip RCD10, and RCD50 datasets for RCA
6 online-boutique.zip Online Boutique dataset for RCA
7 sock-shop-1.zip Sock Shop 1 dataset for RCA
8 sock-shop-2.zip Sock Shop 2 dataset for RCA
9 train-ticket.zip Train Ticket dataset for RCA

Each zip file contains the generated/collected data from the corresponding data generator or microservice benchmark systems (e.g., online-boutique.zip contains metrics data collected from the Online Boutique system). 

Details about the generation of our datasets

1. Synthetic datasets

We use three different synthetic data generators from three previous RCA studies [15, 25, 28] to create the synthetic datasets: CIRCA, RCD, and CausIL data generators. Their mechanisms are as follows:

1. CIRCA datagenerator [28] generates a random causal directed acyclic graph (DAG) based on a given number of nodes and edges. From this DAG, time series data for each node is generated using a vector auto-regression (VAR) model. A fault is injected into a node by altering the noise term in the VAR model for two timestamps.

2. RCD data generator [25] uses the pyAgrum package [3] to generate
a random DAG based on a given number of nodes, subsequently generating discrete time series data for each node, with values ranging from 0 to 5. A fault is introduced into a node by changing its conditional probability distribution.

3. CausIL data generator [15] generates causal graphs and time series data that simulate 
the behavior of microservice systems. It first constructs a DAG of services and metrics based on domain knowledge, then generates metric data for each node of the DAG using regressors trained on real metrics data. Unlike the CIRCA and RCD data generators, the CausIL data generator does not have the capability to inject faults.

To create our synthetic datasets, we first generate 10 DAGs whose nodes range from 10 to 50 for each of the synthetic data generators. Next, we generate fault-free datasets using these DAGs with different seedings, resulting in 100 cases for the CIRCA and RCD generators and 10 cases for the CausIL generator. We then create faulty datasets by introducing ten faults into each DAG and generating the corresponding faulty data, yielding 100 cases for the CIRCA and RCD data generators. The fault-free datasets (e.g. `syn_rcd`, `syn_circa`) are used to evaluate causal discovery methods, while the faulty datasets (e.g. `rca_rcd`, `rca_circa`) are used to assess RCA methods. 

2. Data collected from benchmark microservice systems 

We deploy three popular benchmark microservice systems: Sock Shop [6], Online Boutique [4], and Train Ticket [8], on a four-node Kubernetes cluster hosted by AWS. Next, we use the Istio service mesh [2] with Prometheus [5] and cAdvisor [1] to monitor and collect resource-level and service-level metrics of all services, as in previous works [ 25 , 39, 59 ]. To generate traffic, we use the load generators provided by these systems and customise them to explore all services with 100 to 200 users concurrently. We then introduce five common faults (CPU hog, memory leak, disk IO stress, network delay, and packet loss) into five different services within each system. Finally, we collect metrics data before and after the fault injection operation. An overview of our setup is presented in the Figure below.

Code

The code to reproduce the experimental results in the paper is available at https://github.com/phamquiluan/RCAEval.

References

As in our paper.

Files

online-boutique.zip

Files (606.2 MB)

Name Size Download all
md5:ed1c9989b57365ea78b72eb93d17bff9
31.0 MB Preview Download
md5:68054cdd94458437712d939a5edd3be5
52.7 MB Preview Download
md5:a7fdff392b82aa61ab1ef0bedbd360a6
22.7 MB Preview Download
md5:697e5ad6109a4c63d10d7b6d16617fdd
3.5 MB Preview Download
md5:75c8c577be9671692330eb9dc220a1b3
79.1 MB Preview Download
md5:c094efb7bbb51fe069efcf7764fb1383
73.8 MB Preview Download
md5:dc6519ca9a774faf8cd9d491a874e079
52.5 MB Preview Download
md5:c54c6a5f147ff03e99a7bca4a5d1cc97
11.2 MB Preview Download
md5:d585407d184e8da50bf9d01cd0fc928b
279.7 MB Preview Download

Additional details

Software

Repository URL
https://github.com/phamquiluan/RCAEval
Programming language
Python
Development Status
Active