Published March 12, 2021 | Version v1.0
Software Open

RESTORE: Regression testing tool for datasets

Description

RESTORE stands for REgreSsion Testing tool fOR datasEts. Given two datasets in a scenario where an old version of a dataset is replaced by a new version of a dataset, we analyze the difference between them based on a number of tests, such as

  1. Distribution test: Kolmogorov-Smirnov test;
  2. Correlation tests: Pearson correlation coefficient and Spearman's correlation;
  3. Different variables and records;
  4. Magnitude comparison;
  5. Mean relative errors;
  6. The difference between two hierarchical pairs in Spearman's test;
  7. Features that have NA values;
  8. Hybrid tests, which shows features that appear in Kolmogorov-Smirnov test, mean relative error test, and correlation tests;
  9. Ranking, which shows the ranking of variables that appear in the Kolmogorov-Smirnov test, mean relative error test, and correlation tests.

These tests help us assess if discrepancies in two versions of the dataset are attributed to the natural evolution of the data or to errors in data transformations. The former is expected and can be ignored, while the latter is a defect and should be corrected. RESTORE generates a report that helps an analyst to differentiate between these two cases. Our study suggests that RESTORE enables fast and efficient detection of errors in the data as well as decreases the cost of testing.

Files

miranska/restore-v1.0.zip

Files (745.8 kB)

Name Size Download all
md5:6664cea91efedef3621ae25aa0d16d8d
745.8 kB Preview Download

Additional details

Related works

Is documented by
Preprint: https://arxiv.org/abs/1903.03676 (URL)
Is supplement to
https://github.com/miranska/restore/tree/v1.0 (URL)