Dataset Open Access
Please note that the file msl-labeled-data-set-v2.1.zip below contains the latest images and labels associated with this data set.
Data Set Description
The data set consists of 6,820 images that were collected by the Mars Science Laboratory (MSL) Curiosity Rover by three instruments: (1) the Mast Camera (Mastcam) Left Eye; (2) the Mast Camera Right Eye; (3) the Mars Hand Lens Imager (MAHLI). With the help from Dr. Raymond Francis, a member of the MSL operations team, we identified 19 classes with science and engineering interests (see the "Classes" section for more information), and each image is assigned with 1 class label. We split the data set into training, validation, and test sets in order to train and evaluate machine learning algorithms. The training set contains 5,920 images (including augmented images; see the "Image Augmentation" section for more information); the validation set contains 300 images; the test set contains 600 images. The training set images were randomly sampled from sol (Martian day) range 1 - 948; validation set images were randomly sampled from sol range 949 - 1920; test set images were randomly sampled from sol range 1921 - 2224. All images are resized to 227 x 227 pixels without preserving the original height/width aspect ratio.
The label files are formatted as below:
Each image was labeled with help from three different volunteers (see Contributor list). The final labels are determined using the following processes:
There are 19 classes identified in this data set. In order to simplify our training and evaluation algorithms, we mapped the class names from string to integer representations. The names of classes, string-integer mappings, distributions are shown below:
Class name, counts (training set), counts (validation set), counts (test set), integer representation
Arm cover, 10, 1, 4, 0
Other rover part, 190, 11, 10, 1
Artifact, 680, 62, 132, 2
Nearby surface, 1554, 74, 187, 3
Close-up rock, 1422, 50, 84, 4
DRT, 8, 4, 6, 5
DRT spot, 214, 1, 7, 6
Distant landscape, 342, 14, 34, 7
Drill hole, 252, 5, 12, 8
Night sky, 40, 3, 4, 9
Float, 190, 5, 1, 10
Layers, 182, 21, 17, 11
Light-toned veins, 42, 4, 27, 12
Mastcam cal target, 122, 12, 29, 13
Sand, 228, 19, 16, 14
Sun, 182, 5, 19, 15
Wheel, 212, 5, 5, 16
Wheel joint, 62, 1, 5, 17
Wheel tracks, 26, 3, 1, 18
Only the training set contains augmented images. 3,920 of the 5,920 images in the training set are augmented versions of the remaining 2000 original training images. Images taken by different instruments were augmented differently. As shown below, we employed 5 different methods to augment images. Images taken by the Mastcam left and right eye cameras were augmented using a horizontal flipping method, and images taken by the MAHLI camera were augmented using all 5 methods. Note that one can filter based on the file names listed in the train-set.txt file to obtain a set of non-augmented images.
The authors would like to thank the volunteers (as in the Contributor list) who provided annotations for this data set. We would also like to thank the PDS Imaging Note for the continuous support of this work.