Published February 1, 2022 | Version 0.0.1
Dataset Open

Multi-Domain Outlier Detection Dataset

  • 1. University of Maryland College Park
  • 2. Jet Propulsion Laboratory, California Institute of Technology

Contributors

Data collector:

  • 1. University of Maryland College Park

Description

The Multi-Domain Outlier Detection Dataset contains datasets for conducting outlier detection experiments for four different application domains:

  1. Astrophysics - detecting anomalous observations in the Dark Energy Survey (DES) catalog (data type: feature vectors)
  2. Planetary science - selecting novel geologic targets for follow-up observation onboard the Mars Science Laboratory (MSL) rover (data type: grayscale images)
  3. Earth science: detecting anomalous samples in satellite time series corresponding to ground-truth observations of maize crops (data type: time series/feature vectors)
  4. Fashion-MNIST/MNIST: benchmark task to detect anomalous MNIST images among Fashion-MNIST images (data type: grayscale images)

Each dataset contains a "fit" dataset (used for fitting or training outlier detection models), a "score" dataset (used for scoring samples used to evaluate model performance, analogous to test set), and a label dataset (indicates whether samples in the score dataset are considered outliers or not in the domain of each dataset). 

To read more about the datasets and how they are used for outlier detection, or to cite this dataset in your own work, please see the following citation:

Kerner, H. R., Rebbapragada, U., Wagstaff, K. L., Lu, S., Dubayah, B., Huff, E., Lee, J., Raman, V., and Kulshrestha, S. (2022). Domain-agnostic Outlier Ranking Algorithms (DORA)-A Configurable Pipeline for Facilitating Outlier Detection in Scientific Datasets. Under review for Frontiers in Astronomy and Space Sciences

Files

all_datasets_dora.zip

Files (56.5 MB)

Name Size Download all
md5:173ce0631377e902a1755c823e5367a3
56.5 MB Preview Download