Published June 29, 2021 | Version Version 1.0
Dataset Open

PixBox Landsat 8 pixel collection for CMIX


The PixBox-L8-CMIX dataset was used as a validation reference within the first Cloud Masking Inter-comparison eXercise (CMIX) conducted within the Committee Earth Observation Satellites (CEOS) Working Group on Calibration & Validation (WGCV) in 2019. The PixBox-L8-CMIX pixel collection was existing prior to CMIX and conducted already in 2015.

The overarching idea of PixBox is a quantitative assessment of the quality of a pixel classification which is the result of an automated algorithm/procedure. Pixel classification is defined as assigning a certain number of attributes to an image pixel, such as cloud, clear sky, water, land, inland water, flooded, snow etc. Such pixel classification attributes are typically used to further guide higher level processing.

The PixBox dataset production: trained experienced expert(s) manually classify pixels of an image sensor into a pre-defined detailed set of classes. These are typically different cloud transparencies, cloud shadow, condition of underlying surface (“semi-transparent clouds over snow”, “clouds over bright scattering water”). An average collected dataset includes several 10-thousands of pixels because it has to be representative for all classes, and for various observation and environmental conditions, such as climate zones, sun illumination etc. Quality control of the collected pixels is important in order to detect misclassifications and systematic errors. An auto-associative neural network is trained for this purpose.

The PixBox-L8-CMIX dataset is a pixel collection containing 18,830 pixels manually collected from 11 Landsat 8 Level 1 products. The dataset is temporally well distributed. Spatially it is focused on coastal areas, mainly in Europe. Thematically it is focused on coastal zones, but still representing land and water surfaces.


PixBox-L8-CMIX dataset

The PixBox-L8-CMIX dataset consists of two two main ZIP files, one holding the pixel collection and description, and another one with all used Landsat 8 L1 data. The dataset is structured as follows:

    • The collected features (CSV file).
    • A description to all categories and classes, incl. linkage to the used Landsat 8 L1 products.
    • 11 zipped Landsat 8 Level 1 products [1], used to produce the dataset.


pixbox_landsat8_cmix_20150527.csv - This file contains all collected pixel information in CSV format. All collected classes are stored as integer values. A description of the categories and definition of the integers to class names is given in the additional description file.

pixbox_landsat8_cmix_20150527_description.txt  - This file gives a clear description of the categories and classes. It can be used to convert the class ID numbers, stored in the CSV, to class strings. Additionally, it links the satellite product ID, given in the CSV, to the Sentinel-2 L1C product names.

11 Landsat 8 L1 products in ZIP format.



[1] Landsat 8 products courtesy of the U.S. Geological Survey


Files (9.9 GB)

Name Size Download all
9.9 GB Preview Download
1.6 MB Preview Download

Additional details

Related works

Is supplemented by
Dataset: 10.5281/zenodo.5036991 (DOI)