Published March 9, 2024 | Version v5
Dataset Open

E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects

Description

ABSTRACT 
End-to-end (E2E) testing is a software validation approach that simulates realistic user scenarios throughout the entire workflow of an application. In the context of web
applications, E2E testing involves two activities: Graphic User Interface (GUI) testing, which simulates user interactions with the web app’s GUI through web browsers, and performance testing, which evaluates system workload handling. Despite its recognized importance in delivering high-quality web applications, the availability of large-scale datasets featuring real-world E2E web tests remains limited, hindering research in the field.
To address this gap, we present E2EGit, a comprehensive dataset of non-trivial open-source web projects collected on GitHub that adopt E2E testing. By analyzing over 5,000 web repositories across popular programming languages (JAVA, JAVASCRIPT, TYPESCRIPT, and PYTHON), we identified 472 repositories implementing 43,670 automated Web GUI tests with popular browser automation frameworks (SELENIUM, PLAYWRIGHT, CYPRESS, PUPPETEER), and 84 repositories that featured 271 automated performance tests implemented leveraging the most popular open-source tools (JMETER, LOCUST). Among these, 13 repositories implemented both types of testing for a total of 786 Web GUI tests and 61 performance tests.


DATASET DESCRIPTION 
The dataset is provided as an SQLite database, whose structure is illustrated in Figure 3 (in the paper), which consists of five tables, each serving a specific purpose.
The repository table contains information on 1.5 million repositories collected using the SEART tool on May 4. It includes 34 fields detailing repository characteristics. The
non_trivial_repository table is a subset of the previous one, listing repositories that passed the two filtering stages described in the pipeline. For each repository, it specifies whether it is a web repository using JAVA, JAVASCRIPT, TYPESCRIPT, or PYTHON frameworks. A repository may use multiple frameworks, with corresponding fields (e.g., is web java) set to true, and the field web dependencies listing the detected web frameworks. For Web GUI testing, the dataset includes two additional tables; gui_testing_test _details, where each row represents a test file, providing the file path, the browser automation framework used, the test engine employed, and the number of tests implemented in the file. gui_testing_repo_details, aggregating data from the previous table at the repository level. Each of the 472 repositories has a row summarizing
the number of test files using frameworks like SELENIUM or PLAYWRIGHT, test engines like JUNIT, and the total number of tests identified. For performance testing, the performance_testing_test_details table contains 410 rows, one for each test identified. Each row includes the file path, whether the test uses JMETER or LOCUST, and extracted details such as the number of thread groups, concurrent users, and requests. Notably, some fields may be absent—for instance, if external files (e.g., CSVs defining workloads) were unavailable, or in the case of Locust tests, where parameters like duration and concurrent users are specified via the command line.

To cite this article refer to this citation:

@inproceedings{di2025e2egit,
  title={E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects},
  author={Di Meglio, Sergio and Starace, Luigi Libero Lucio and Pontillo, Valeria and Opdebeeck, Ruben and De Roover, Coen and Di Martino, Sergio},
  booktitle={2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR)},
  pages={10--15},
  year={2025},
  organization={IEEE/ACM}
}

This work has been partially supported by the Italian PNRR MUR project PE0000013-FAIR. 

Files

2024_MSR25_DataShowCase_E2ETesting.pdf

Files (4.2 GB)

Name Size Download all
md5:ca4b5de3bdf3dedbdbe5920b0881a7d6
225.8 kB Preview Download
md5:01e69c6956bb747d4142cfcd9d5eebb2
2.4 GB Download
md5:73659d11ea456e5ae05bc0808fa8607f
1.8 GB Download
md5:5062706bf53c75bee6ee5dd59e59512f
236 Bytes Preview Download
md5:5cb03ec135c0472d24414f9da012040e
335 Bytes Preview Download
md5:7bc4d1ae4cb26553eed7f802452f6a42
203 Bytes Preview Download
md5:7f56643b92fed177fb6c915b0f7aa79d
182 Bytes Preview Download