Published October 16, 2023 | Version v1.0
Journal article Open

STARCOP dataset: Semantic Segmentation of Methane Plumes with Hyperspectral Machine Learning Models

  • 1. University of Oxford
  • 2. Trillium Technologies
  • 3. University of Valencia
  • 4. University of Cambridge
  • 5. Polytechnic University of Valencia
  • 6. ROR icon Environmental Defense Fund

Description

Task:

Methane is the second most important greenhouse gas contributor to climate change; at the same time its reduction has been denoted as one of the fastest pathways to preventing temperature growth due to its short atmospheric lifetime. In particular, the mitigation of active point-sources associated to the fossil fuel industry has a strong and cost-effective mitigation potential. Detection of methane plumes in remote sensing data is possible, but the existing approaches exhibit high false positive rates and need manual intervention. Machine learning research in this area is limited due to the lack of large real-world annotated datasets.

Dataset:

In this work, we are publicly releasing a machine learning ready dataset with manually refined annotation of methane plumes. We present labelled hyperspectral data from the AVIRIS-NG sensor and provide simulated multispectral WorldView-3 views of the same data to allow for model benchmarking across hyperspectral and multispectral sensors.

Models (in the paper):

We propose sensor agnostic machine learning architectures, using classical methane enhancement products as input features. Our HyperSTARCOP model outperforms strong matched filter baseline by over 25% in F1 score, while reducing its false positive rate per classified tile by over 41.83%. Additionally, we demonstrate zero-shot generalisation of our trained model on data from the EMIT hyperspectral instrument, despite the differences in the spectral and spatial resolution between the two sensors: in an annotated subset of EMIT images HyperSTARCOP achieves a 40% gain in F1 score over the baseline.

 

Instructions:

For the most up to date instructions on how to load the data, please see our associated github page: https://github.com/spaceml-org/STARCOP ... Here we will place only a brief description:

Download all the *.zip files you want to use (STARCOP_test.zip, STARCOP_train_easy.zip, STARCOP_train_remaining_partX.zip) and unzip them into one folder. For example if you just want to run evaluation, downloading only STARCOP_test.zip is enough - or if you want to experiment with a fast training demo, only STARCOP_train_easy.zip would again be enough. Place all the *.csv files into the same folder. Then adjust the link in the code to this folder.

The zip files separate the evaluation set (in STARCOP_test.zip) and the different difficulty levels in the training set (STARCOP_train_easy.zip and all the other STARCOP_train_remaining_partX.zip). "Easy" marks plume events with qplume > 1000 (large plumes), while the remaining set which can be considered as hard has qplume < 1000 (small plumes). We note that this qplume value is taken from the source data (by Cusworth, D. H. et al. 2021).

Note on mirror hosting: Some of the smaller files were also uploaded to Google Drive, to allow for easier data loading for example in Google Colab - please check the main code repo for any instructions to do this. The files STARCOP_train_easy.zip and STARCOP_test.zip on the Google Drive and the files with the same names in Zenodo (here) are the same files.

 

Citation:

If you find our dataset useful, please cite our work:

@article{ruzicka_starcop_2023,
    title = {Semantic segmentation of methane plumes with hyperspectral machine learning models},
    volume = {13},
    issn = {2045-2322},
    url = {https://www.nature.com/articles/s41598-023-44918-6},
    doi = {10.1038/s41598-023-44918-6},
    number = {1},
    journal = {Scientific Reports},
    author = {Růžička, Vít and Mateo-Garcia, Gonzalo and Gómez-Chova, Luis and Vaughan, Anna, and Guanter, Luis and Markham, Andrew},
    month = nov,
    year = {2023},
    pages = {19999},
}

Files

STARCOP_test.zip

Files (60.8 GB)

Name Size Download all
md5:c789cba97907070bae0c59ae36dafc1b
5.6 GB Preview Download
md5:aa3f5674d0951f43b386dde1c59f08b5
9.0 GB Preview Download
md5:ddb419cac0423013abd8039b5d1acea4
9.2 GB Preview Download
md5:ec00238df6c079f082d84bb77f42cf76
9.0 GB Preview Download
md5:e563421e499dfb7ec5e068d51fdd8b24
9.3 GB Preview Download
md5:b9127b7e466b22ff95e22c7e35cbb019
9.2 GB Preview Download
md5:5f1b02b5e26767417450b8a263569edb
9.4 GB Preview Download
md5:6bc8305de45f282651d3d5de6d055e3c
102.9 kB Preview Download
md5:02f56b8e9759d01a1ee039f2eeaf4ebc
1.0 MB Preview Download
md5:57875c817e0ee9046bafb8214ff46430
168.8 kB Preview Download

Additional details

Related works

Is cited by
Preprint: 10.21203/rs.3.rs-2899370/v1 (DOI)
Publication: 10.1038/s41598-023-44918-6 (DOI)

Dates

Available
2023-11-17

Software

Repository URL
https://github.com/spaceml-org/STARCOP
Programming language
Python

References

  • Růžička, V., Mateo-Garcia, G., Gómez-Chova, L. et al. Semantic segmentation of methane plumes with hyperspectral machine learning models. Sci Rep 13, 19999 (2023). https://doi.org/10.1038/s41598-023-44918-6