MER Opportunity and Spirit Rovers Pancam Images Labeled Data Set
Creators
- 1. Duke University
- 2. Space Science Institute
- 3. Jet Propulsion Laboratory
Description
Introduction
The data set is based on 3,004 images collected by the Pancam instruments mounted on the Opportunity and Spirit rovers from NASA's Mars Exploration Rovers (MER) mission. We used rotation, skewing, and shearing augmentation methods to increase the total collection to 70,864 (see Image Augmentation section for more information). Based on the MER Data Catalog User Survey [1], we identified 25 classes of both scientific (e.g. soil trench, float rocks, etc.) and engineering (e.g. rover deck, Pancam calibration target, etc.) interests (see Classes section for more information). The 3,004 images were labeled on Zooniverse platform, and each image is allowed to be assigned with multiple labels. The images are either 512 x 512 or 1024 x 1024 pixels in size (see Image Sampling section for more information).
Classes
There is a total of 25 classes for this data set. See the list below for class names, counts, and percentages (the percentages are computed as count divided by 3,004). Note that the total counts don't sum up to 3,004 and the percentages don't sum up to 1.0 because each image may be assigned with more than one class.
- Class name, count, percentage of dataset
- Rover Deck, 222, 7.39%
- Pancam Calibration Target, 14, 0.47%
- Arm Hardware, 4, 0.13%
- Other Hardware, 116, 3.86%
- Rover Tracks, 301, 10.02%
- Soil Trench, 34, 1.13%
- RAT Brushed Target, 17, 0.57%
- RAT Hole, 30, 1.00%
- Rock Outcrop, 1915, 63.75%
- Float Rocks, 860, 28.63%
- Clasts, 1676, 55.79%
- Rocks (misc), 249, 8.29%
- Bright Soil, 122, 4.06%
- Dunes/Ripples, 1000, 33.29%
- Rock (Linear Features), 943, 31.39%
- Rock (Round Features), 219, 7.29%
- Soil, 2891, 96.24%
- Astronomy, 12, 0.40%
- Spherules, 868, 28.89%
- Distant Vista, 903, 30.23%
- Sky, 954, 31.76%
- Close-up Rock, 23, 0.77%
- Nearby Surface, 2006, 66.78%
- Rover Parts, 301, 10.02%
- Artifacts, 28, 0.93%
Image Sampling
Images in the MER rover Pancam archive are of sizes ranging from 64x64 to 1024x1024 pixels. The largest size, 1024x1024, was by far the most common size in the archive. For the deep learning dataset, we elected to sample only 1024x1024 and 512x512 images as the higher resolution would be beneficial to feature extraction.
In order to ensure that the data set is representative of the total image archive of 4.3 million images, we elected to sample via "site code". Each Pancam image has a corresponding two-digit alphanumeric "site code" which is used to track location throughout its mission. Since each "site code" corresponds to a different general location, sampling a fixed proportion of images taken from each site ensure that the data set contained some images from each location. In this way, we could ensure that a model performing well on this dataset would generalize well to the unlabeled archive data as a whole. We randomly sampled 20% of the images at each site within the subset of Pancam data fitting all other image criteria, applying a floor function to non-whole number sample sizes, resulting in a dataset of 3,004 images.
Train/validation/test sets split
The 3,004 images were split into train, validation, and test data sets. The split was done so that roughly 60, 15, and 25 percent of the 3,004 images would end up as train, validation, and test data sets respectively, while ensuing that images from a given site are not split between train/validaiton/test data sets. This resulted in 1,806 train images, 456 validation images, and 742 test images.
Augmentation
To augment the images in train and validation data sets (note that images in the test data set were not augmented), three augmentation methods were chosen that best represent transformations that could be realistically seen in Pancam images. The three augmentations methods are rotation, skew, and shear. The augmentation methods were applied with random magnitude, followed by a random horizontal flipping, to create 30 augmented images for each image. Since each transformation is followed by a square crop in order to keep input shape consistent, we had to constrict the magnitude limits of each augmentation to avoid cropping out important features at the edges of input images. Thus, rotations were limited to 15 degrees in either direction, the 3-dimensional skew was limited to 45 degrees in any direction, and shearing was limited to 10 degrees in either direction. Note that augmentation was done only on training and validation images.
Directory Contents
- images: contains all 70,864 images
- train-set-v1.1.0.txt: label file for the training data set
- val-set-v1.1.0.txt: label file for the validation data set
- test-set-v1.1.0.txt: label file for the testing data set
Images with relatively short file names (e.g., 1p128287181mrd0000p2303l2m1.img.jpg) are original images, and images with long file names (e.g., 1p128287181mrd0000p2303l2m1.img.jpg_04140167-5781-49bd-a913-6d4d0a61dab1.jpg) are augmented images. The label files are formatted as "Image name, Class1, Class2, ..., ClassN".
Reference
[1] S.B. Cole, J.C. Aubele, B.A. Cohen, S.M. Milkovich, and S.A. Shields, Identifying Community Needs for a Mars Exploration Rovers (MER), Daata Catalog, 51st Lunar and Planetary Science Conference (LPSC), 2020.
Files
Files
(7.2 GB)
Name | Size | Download all |
---|---|---|
md5:22f4da45fc003a660e7b637352b6eeb8
|
7.2 GB | Download |