STARCOP dataset: Semantic Segmentation of Methane Plumes with Hyperspectral Machine Learning Models
Creators
Description
Task:
Methane is the second most important greenhouse gas contributor to climate change; at the same time its reduction has been denoted as one of the fastest pathways to preventing temperature growth due to its short atmospheric lifetime. In particular, the mitigation of active point-sources associated to the fossil fuel industry has a strong and cost-effective mitigation potential. Detection of methane plumes in remote sensing data is possible, but the existing approaches exhibit high false positive rates and need manual intervention. Machine learning research in this area is limited due to the lack of large real-world annotated datasets.
Dataset:
In this work, we are publicly releasing a machine learning ready dataset with manually refined annotation of methane plumes. We present labelled hyperspectral data from the AVIRIS-NG sensor and provide simulated multispectral WorldView-3 views of the same data to allow for model benchmarking across hyperspectral and multispectral sensors.
Models (in the paper):
We propose sensor agnostic machine learning architectures, using classical methane enhancement products as input features. Our HyperSTARCOP model outperforms strong matched filter baseline by over 25% in F1 score, while reducing its false positive rate per classified tile by over 41.83%. Additionally, we demonstrate zero-shot generalisation of our trained model on data from the EMIT hyperspectral instrument, despite the differences in the spectral and spatial resolution between the two sensors: in an annotated subset of EMIT images HyperSTARCOP achieves a 40% gain in F1 score over the baseline.
Instructions:
For the most up to date instructions on how to load the data, please see our associated github page: https://github.com/spaceml-org/STARCOP ... Here we will place only a brief description:
Download all the *.zip files you want to use (STARCOP_test.zip, STARCOP_train_easy.zip, STARCOP_train_remaining_partX.zip) and unzip them into one folder. For example if you just want to run evaluation, downloading only STARCOP_test.zip is enough - or if you want to experiment with a fast training demo, only STARCOP_train_easy.zip would again be enough. Place all the *.csv files into the same folder. Then adjust the link in the code to this folder.
The zip files separate the evaluation set (in STARCOP_test.zip) and the different difficulty levels in the training set (STARCOP_train_easy.zip and all the other STARCOP_train_remaining_partX.zip). "Easy" marks plume events with qplume > 1000 (large plumes), while the remaining set which can be considered as hard has qplume < 1000 (small plumes). We note that this qplume value is taken from the source data (by Cusworth, D. H. et al. 2021).
Note on mirror hosting: Some of the smaller files were also uploaded to Google Drive, to allow for easier data loading for example in Google Colab - please check the main code repo for any instructions to do this. The files STARCOP_train_easy.zip and STARCOP_test.zip on the Google Drive and the files with the same names in Zenodo (here) are the same files.
Citation:
If you find our dataset useful, please cite our work:
@article{ruzicka_starcop_2023,
title = {Semantic segmentation of methane plumes with hyperspectral machine learning models},
volume = {13},
issn = {2045-2322},
url = {https://www.nature.com/articles/s41598-023-44918-6},
doi = {10.1038/s41598-023-44918-6},
number = {1},
journal = {Scientific Reports},
author = {Růžička, Vít and Mateo-Garcia, Gonzalo and Gómez-Chova, Luis and Vaughan, Anna, and Guanter, Luis and Markham, Andrew},
month = nov,
year = {2023},
pages = {19999},
}
Files
STARCOP_test.zip
Files
(60.8 GB)
Name | Size | Download all |
---|---|---|
md5:c789cba97907070bae0c59ae36dafc1b
|
5.6 GB | Preview Download |
md5:aa3f5674d0951f43b386dde1c59f08b5
|
9.0 GB | Preview Download |
md5:ddb419cac0423013abd8039b5d1acea4
|
9.2 GB | Preview Download |
md5:ec00238df6c079f082d84bb77f42cf76
|
9.0 GB | Preview Download |
md5:e563421e499dfb7ec5e068d51fdd8b24
|
9.3 GB | Preview Download |
md5:b9127b7e466b22ff95e22c7e35cbb019
|
9.2 GB | Preview Download |
md5:5f1b02b5e26767417450b8a263569edb
|
9.4 GB | Preview Download |
md5:6bc8305de45f282651d3d5de6d055e3c
|
102.9 kB | Preview Download |
md5:02f56b8e9759d01a1ee039f2eeaf4ebc
|
1.0 MB | Preview Download |
md5:57875c817e0ee9046bafb8214ff46430
|
168.8 kB | Preview Download |
Additional details
Related works
- Is cited by
- Preprint: 10.21203/rs.3.rs-2899370/v1 (DOI)
- Publication: 10.1038/s41598-023-44918-6 (DOI)
Dates
- Available
-
2023-11-17
Software
- Repository URL
- https://github.com/spaceml-org/STARCOP
- Programming language
- Python
References
- Růžička, V., Mateo-Garcia, G., Gómez-Chova, L. et al. Semantic segmentation of methane plumes with hyperspectral machine learning models. Sci Rep 13, 19999 (2023). https://doi.org/10.1038/s41598-023-44918-6