Software Open Access

RESTORE: Regression testing tool for datasets

Howard, Sean; Mahajan, Krittika; Miranskyy, Andriy; Montpool, Tom; Moore, Jessica; Zhang, Lei

RESTORE stands for REgreSsion Testing tool fOR datasEts. Given two datasets in a scenario where an old version of a dataset is replaced by a new version of a dataset, we analyze the difference between them based on a number of tests, such as

  1. Distribution test: Kolmogorov-Smirnov test;
  2. Correlation tests: Pearson correlation coefficient and Spearman's correlation;
  3. Different variables and records;
  4. Magnitude comparison;
  5. Mean relative errors;
  6. The difference between two hierarchical pairs in Spearman's test;
  7. Features that have NA values;
  8. Hybrid tests, which shows features that appear in Kolmogorov-Smirnov test, mean relative error test, and correlation tests;
  9. Ranking, which shows the ranking of variables that appear in the Kolmogorov-Smirnov test, mean relative error test, and correlation tests.

These tests help us assess if discrepancies in two versions of the dataset are attributed to the natural evolution of the data or to errors in data transformations. The former is expected and can be ignored, while the latter is a defect and should be corrected. RESTORE generates a report that helps an analyst to differentiate between these two cases. Our study suggests that RESTORE enables fast and efficient detection of errors in the data as well as decreases the cost of testing.

Files (745.8 kB)
Name Size
miranska/restore-v1.0.zip
md5:6664cea91efedef3621ae25aa0d16d8d
745.8 kB Download
17
2
views
downloads
All versions This version
Views 1717
Downloads 22
Data volume 1.5 MB1.5 MB
Unique views 1515
Unique downloads 22

Share

Cite as