RESTORE: Regression testing tool for datasets
Description
RESTORE stands for REgreSsion Testing tool fOR datasEts. Given two datasets in a scenario where an old version of a dataset is replaced by a new version of a dataset, we analyze the difference between them based on a number of tests, such as
- Distribution test: Kolmogorov-Smirnov test;
- Correlation tests: Pearson correlation coefficient and Spearman's correlation;
- Different variables and records;
- Magnitude comparison;
- Mean relative errors;
- The difference between two hierarchical pairs in Spearman's test;
- Features that have NA values;
- Hybrid tests, which shows features that appear in Kolmogorov-Smirnov test, mean relative error test, and correlation tests;
- Ranking, which shows the ranking of variables that appear in the Kolmogorov-Smirnov test, mean relative error test, and correlation tests.
These tests help us assess if discrepancies in two versions of the dataset are attributed to the natural evolution of the data or to errors in data transformations. The former is expected and can be ignored, while the latter is a defect and should be corrected. RESTORE generates a report that helps an analyst to differentiate between these two cases. Our study suggests that RESTORE enables fast and efficient detection of errors in the data as well as decreases the cost of testing.
Files
miranska/restore-v1.0.zip
Files
(745.8 kB)
Name | Size | Download all |
---|---|---|
md5:6664cea91efedef3621ae25aa0d16d8d
|
745.8 kB | Preview Download |
Additional details
Related works
- Is documented by
- Preprint: https://arxiv.org/abs/1903.03676 (URL)
- Is supplement to
- https://github.com/miranska/restore/tree/v1.0 (URL)