Software Open Access
Howard, Sean; Mahajan, Krittika; Miranskyy, Andriy; Montpool, Tom; Moore, Jessica; Zhang, Lei
RESTORE stands for REgreSsion Testing tool fOR datasEts. Given two datasets in a scenario where an old version of a dataset is replaced by a new version of a dataset, we analyze the difference between them based on a number of tests, such as
These tests help us assess if discrepancies in two versions of the dataset are attributed to the natural evolution of the data or to errors in data transformations. The former is expected and can be ignored, while the latter is a defect and should be corrected. RESTORE generates a report that helps an analyst to differentiate between these two cases. Our study suggests that RESTORE enables fast and efficient detection of errors in the data as well as decreases the cost of testing.