GHTraffic: A Dataset for Reproducible Research in Service-Oriented Computing
Description
This is the latest version of the GHTraffic project. The main aim is to model a variety of transaction sequences to reflect more complex service behaviour.
It has two editions: Small (S) and Large (L) where the records were created by selecting the same repositories as the original Small and Large datasets. The newest S dataset contains records from google/guava repository. The L dataset contains records from eight repositories (i.e., twbs/bootstrap, symfony/symfony, docker/docker, Homebrew/homebrew, rust-lang/rust, kubernetes/kubernetes, rails/rails, and angular/angular.js).
The entire data generation process is quite similar to the original GHTraffic design. But it incorporates minor changes to the process of synthetic data generation where it uses a random date after successfully posting a resource to make up the request and response for all of the HTTP methods. It also adds yet another subset of unsuccessful transactions by stipulating requests before resource creation is successful.
This results in a far more dynamic series of transactions to named resources.
Scripts used for datasets construction are accessible from the repository.
Notes
Files
ghtraffic-L-2.0.0.zip
Files
(447.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:2393164286467c2243d43c8056fec921
|
441.3 MB | Preview Download |
|
md5:cf00cf3774dc280939b992a650f4870f
|
6.1 MB | Preview Download |