FireSR: A Dataset for Super-Resolution and Segmentation of Burned Areas

Brune, Eric

doi:10.5281/zenodo.13384289

Published August 28, 2024 | Version v2

Dataset Open

FireSR: A Dataset for Super-Resolution and Segmentation of Burned Areas

Brune, Eric (Contact person)¹

1. KTH Royal Institute of Technology

Contributors

Data collector:

Brune, Eric

# FireSR Dataset

## Overview

**FireSR** is a dataset designed for the super-resolution and segmentation of wildfire-burned areas. It includes data for all wildfire events in Canada from 2017 to 2023 that exceed 2000 hectares in size, as reported by the National Burned Area Composite (NBAC). The dataset aims to support high-resolution daily monitoring and improve wildfire management using machine learning techniques.

## Dataset Structure

The dataset is organized into several directories, each containing data relevant to different aspects of wildfire monitoring:

- **S2**: Contains Sentinel-2 images.
- **pre**: Pre-fire Sentinel-2 images (high resolution).
- **post**: Post-fire Sentinel-2 images (high resolution).

- **mask**: Contains NBAC polygons, which serve as ground truth masks for the burned areas.
- **pre**: Burned area labels from the year before the fire, using the same spatial bounds as the fire events of the current year.
- **post**: Burned area labels corresponding to post-fire conditions.

- **MODIS**: Contains post-fire MODIS images (lower resolution).

- **LULC**: Contains land use/land cover data from ESRI Sentinel-2 10-Meter Land Use/Land Cover (2017-2023).

- **Daymet**: Contains weather data from Daymet V4: Daily Surface Weather and Climatological Summaries.

### File Naming Convention

Each GeoTIFF (.tif) file is named according to the format: `CA_<year>_<province>_<id>.tif`, where:
- `CA` stands for Canada.
- `<year>` is the year of the wildfire event.
- `<province>` is the province code (e.g., AB for Alberta, BC for British Columbia).
- `<id>` is a unique identifier for the wildfire event.

### Directory Structure

The dataset is organized as follows:

```
FireSR/
│
├── dataset/
│ ├── S2/
│ │ ├── post/
│ │ │ ├── CA_2017_AB_204.tif
│ │ │ ├── CA_2017_AB_2418.tif
│ │ │ └── ...
│ │ ├── pre/
│ │ │ ├── CA_2017_AB_204.tif
│ │ │ ├── CA_2017_AB_2418.tif
│ │ │ └── ...
│ ├── mask/
│ │ ├── post/
│ │ │ ├── CA_2017_AB_204.tif
│ │ │ ├── CA_2017_AB_2418.tif
│ │ │ └── ...
│ │ ├── pre/
│ │ │ ├── CA_2017_AB_204.tif
│ │ │ ├── CA_2017_AB_2418.tif
│ │ │ └── ...
│ ├── MODIS/
│ │ ├── CA_2017_AB_204.tif
│ │ ├── CA_2017_AB_2418.tif
│ │ └── ...
│ ├── LULC/
│ │ ├── CA_2017_AB_204.tif
│ │ ├── CA_2017_AB_2418.tif
│ │ └── ...
│ ├── Daymet/
│ │ ├── CA_2017_AB_204.tif
│ │ ├── CA_2017_AB_2418.tif
│ │ └── ...
```

### Spatial Resolution and Channels

- **Sentinel-2 (S2) Images**: 20 meters (Bands: B12, B8, B4)
- **MODIS Images**: 250 meters (Bands: B7, B2, B1)
- **NBAC Burned Area Labels**: 20 meters (1 channel, binary classification: burned/unburned)
- **Daymet Weather Data**: 1000 meters (7 channels: dayl, prcp, srad, swe, tmax, tmin, vp)
- **ESRI Land Use/Land Cover Data**: 10 meters (1 channel with 9 classes: water, trees, flooded vegetation, crops, built area, bare ground, snow/ice, clouds, rangeland)

**Daymet Weather Data**: The Daymet dataset includes seven channels that provide various weather-related parameters, which are crucial for understanding and modeling wildfire conditions:

|------|-------|-----|-----|-------------|

| prcp | mm | 0 | 544 | Daily total precipitation, sum of all forms converted to water-equivalent. |

| srad | W/m^2 | 0 | 1051 | Incident shortwave radiation flux density, averaged over the daylight period of the day. |

| swe | kg/m^2 | 0 | 13931 | Snow water equivalent, representing the amount of water contained within the snowpack. |

| tmax | °C | -60 | 60 | Daily maximum 2-meter air temperature. |

| tmin | °C | -60 | 42 | Daily minimum 2-meter air temperature. |

| vp | Pa | 0 | 8230 | Daily average partial pressure of water vapor. |

**ESRI Land Use/Land Cover Data**: The ESRI 10m Annual Land Cover dataset provides a time series of global maps of land use and land cover (LULC) from 2017 to 2023 at a 10-meter resolution. These maps are derived from ESA Sentinel-2 imagery and are generated by Impact Observatory using a deep learning model trained on billions of human-labeled pixels. Each map is a composite of LULC predictions for 9 classes throughout the year, offering a representative snapshot of each year.

| Class Value | Land Cover Class |

|-------------|------------------|

| 1 | Water |

| 2 | Trees |

| 4 | Flooded Vegetation |

| 5 | Crops |

| 7 | Built Area |

| 8 | Bare Ground |

| 9 | Snow/Ice |

| 10 | Clouds |

| 11 | Rangeland |

## Usage Tutorial

To help users get started with FireSR, we provide a comprehensive tutorial with scripts for data extraction and processing. Below is an example workflow:

### Step 1: Extract FireSR.tar.gz

```bash
tar -xvf FireSR.tar.gz
```

### Step 2: Tiling the GeoTIFF Files

The dataset contains high-resolution GeoTIFF files. For machine learning models, it may be useful to tile these images into smaller patches. Here's a Python script to tile the images:

```python
import rasterio
from rasterio.windows import Window
import os

def tile_image(image_path, output_dir, tile_size=128):
with rasterio.open(image_path) as src:
for i in range(0, src.height, tile_size):
for j in range(0, src.width, tile_size):
window = Window(j, i, tile_size, tile_size)
transform = src.window_transform(window)
outpath = os.path.join(output_dir, f"{os.path.basename(image_path).split('.')[0]}_{i}_{j}.tif")
with rasterio.open(outpath, 'w', driver='GTiff', height=tile_size, width=tile_size, count=src.count, dtype=src.dtypes[0], crs=src.crs, transform=transform) as dst:
dst.write(src.read(window=window))

# Example usage
tile_image('FireSR/dataset/S2/post/CA_2017_AB_204.tif', 'tiled_images/')
```

### Step 3: Loading Data into a Machine Learning Model

After tiling, the images can be loaded into a machine learning model using libraries like PyTorch or TensorFlow. Here's an example using PyTorch:

```python
import torch
from torch.utils.data import Dataset
from torchvision import transforms
import rasterio

class FireSRDataset(Dataset):
def __init__(self, image_dir, transform=None):
self.image_dir = image_dir
self.transform = transform
self.image_paths = [os.path.join(image_dir, f) for f in os.listdir(image_dir) if f.endswith('.tif')]

def __len__(self):
return len(self.image_paths)

def __getitem__(self, idx):
image_path = self.image_paths[idx]
with rasterio.open(image_path) as src:
image = src.read()
if self.transform:
image = self.transform(image)
return image

# Example usage
dataset = FireSRDataset('tiled_images/', transform=transforms.ToTensor())
dataloader = torch.utils.data.DataLoader(dataset, batch_size=16, shuffle=True)
```

## License

This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share and adapt the material as long as appropriate credit is given.

## Contact

For any questions or further information, please contact:
- Name: Eric Brune
- Email: ebrune@kth.se

Files

Files (73.4 GB)

Name	Size	Download all
FireSR.tar.gz md5:8f03c00e661cbf1d47e0a496fdca2558	73.4 GB	Download

Additional details

Submitted: 2024-06-05

Submitted to NeurIPS 2024 Datasets and Benchmarks Track

	All versions	This version
Views	53	29
Downloads	8	1
Data volume	851.3 GB	73.4 GB

FireSR: A Dataset for Super-Resolution and Segmentation of Burned Areas

Creators

Contributors

Data collector:

Description

Files

Files (73.4 GB)

Additional details

Dates