There is a newer version of the record available.

Published May 15, 2025 | Version v4
Dataset Open

Fire-D: Analysis and ML-Ready NASA-Centric Remote Sensing of Wildfire and Smoke

Description

Earth science remote sensing imagery is rich in structural and spectral information, making such data an ideal platform for benchmarking for a broad range of machine learning (ML) tasks, from pattern retrieval to physics-informed classification to anomaly detection to transfer learning. Nevertheless, the utility of Earth science remote sensing data remains largely unexplored by the broader ML community. Our goal is to bridge this gap and bring a rich variety of multisource multi-resolution Earth image data to a wider range of ML researchers who are non-experts in remote sensing, thereby increasing the utility and societal impact of such data products. In particular, motivated by the emerging wildfire crisis, we present radiometrically and geometrically calibrated radiance data from airborne and orbital instruments from the National Aeronautics and Space Administration (NASA), the National Oceanic and Atmospheric Administration (NOAA), and the Korean Meteorological Administration (KMA).

Given the scarce occurrence of wildfires and complex spatio-temporal dependencies in radiance data, these datasets are especially well suited for benchmarking unsupervised and self-supervised learning tasks both on images and non-Euclidean objects. Our experiments on these datasets indicate that contrastive learning and transfer learning algorithms can capture the structures of views and scenes, map pixel space of multi-sensor imagery to a high-level embedding space for further downstream tasks, and facilitate more cohesive integration of the state-of-the-art ML approaches into wildfire risk analytics.

All NASA-based observations are freely usable under the Creative Commons Zero License.There are also no restrictions on the use of GOES DataGK2A data are also open data without any restrictions on its use.

For the Planet data, we cannot not share the Radiances, but all masks within this dataset are freely usable with no restrictions.

 

Use:

An example of programmatic data access and usage can be found in the dataset's associated GitHub repository

 

On the data input, input geometrically and radiometrically calibrated radiance data has been pulled from various NASA, NOAA, Planet, and KMA archives. For instruments that have multiple different spatial resolutions within their spectral bands (GOES and GK2A), all bands have been resampled to the lowest collective spatial resolution.

Geometric and radiometric calibration has been done by the science data processing pipelines of the various missions, and would not need to be done by anyone else looking to curate the same data. Further information for each instrument can be found in each of the publicly available Level-1 algorithm theoretical basis documents (ATBDs)

All input and label data have been put in GeoTiff format. Each band is in a separate raster band and each scene is in a separate GeoTiff file. Label files and input files are in separate tar files, labeled respectively, and the file names match for input and labels, with the exception of an additional .fire and .smoke in the respective label filenames and subfolders.

The GeoTiff data format natively contains geolocation metadata internally, and can be interfaced with via C/C++/Python GDAL packages, or other python packages that wrap GDAL, like rasterio and rioxarray . The documentation for SIT-FUSE , the package with which the labels were generated, also has examples on how to read and interface with various data formats, including GeoTiffs. Lastly, this data can be interfaced with using Geographic Information Systems (GIS), like the free and open-source QGIS.

Timing information can be found in the file names, which all use the standard formats from the various instruments' L1B datasets.

V2 includes additional GOES-18 radiance data and associated smoke and fire labels for the recent LA fires (Palisades and Eaton fires in January of 2025).

V3 provides a reorganization of all data, and an inclusion of improved and additional data from airborne and satellite platforms in 2019, associated with this study: https://arxiv.org/pdf/2501.15343 . 

Current fire coverage includes:

  • 2019: Williams Flats, Sheridan, Horsefly, and Mosquito (US)
  • 2022: Uljin Forest Fire (S. Korea; largest fire on record in S. Korea)
  • 2025: Palisades and Eaton Fires (US)

Additional data for the 2025 Palisades and Eaton fires from the TEMPO instrument is currently being validated and will be released in a version shortly.

 

Validation:

These labels have been extensively validated and further information can be referenced in associated publications:
https://doi.org/10.3390/rs13122364
https://doi.org/10.3390/rs17071267

 

Files

Files (25.8 GB)

Name Size Download all
md5:87cd8bbd12995b3dd9856721912172ac
736.6 kB Download
md5:090987764b685074f853a5ea8792d012
791.8 MB Download
md5:483d52e91dcfdd41469d8112e64425c5
1.1 MB Download
md5:4f6ffbc74edaabe09647217d9dec3667
8.5 GB Download
md5:2c204350e665ad895c8c3210227da0e9
4.2 GB Download
md5:9fef8b995480e179f05ad6fd20aad38d
1.1 MB Download
md5:03c78378a6ad8b70e5a4a5db6b3d254f
8.3 GB Download
md5:7517f06e751406a218b3519048ada9c5
57.1 kB Download
md5:949656318077178f83d1fc499819136c
254.1 MB Download
md5:72882cb4d480160ca0ed25badb310c1b
4.8 kB Download
md5:45335ae6ec855838599444218afe6d1d
1.2 GB Download
md5:90a1900e8ade2bd8e0d419e02a2cc5ec
102.5 kB Download
md5:83a4ea70620eb0051385e8d67cf12c6a
84.7 MB Download
md5:e3a812c4ddfff0c8d4aee1111429aa35
241.1 kB Download
md5:1b088c5ded2cb55cd70d9acdb7b9ab39
2.4 GB Download
md5:846c206822e083673062265e47391ec2
9.7 MB Download

Additional details

Related works

Is part of
Journal article: 10.3390/rs17071267 (DOI)
Book chapter: 10.1016/B978-0-44-319077-3.00013-4 (DOI)