Published August 28, 2024 | Version v2
Dataset Open

FireSR: A Dataset for Super-Resolution and Segmentation of Burned Areas

  • 1. ROR icon KTH Royal Institute of Technology

Contributors

Data collector:

Description


# FireSR Dataset

## Overview

**FireSR** is a dataset designed for the super-resolution and segmentation of wildfire-burned areas. It includes data for all wildfire events in Canada from 2017 to 2023 that exceed 2000 hectares in size, as reported by the National Burned Area Composite (NBAC). The dataset aims to support high-resolution daily monitoring and improve wildfire management using machine learning techniques.

## Dataset Structure

The dataset is organized into several directories, each containing data relevant to different aspects of wildfire monitoring:

- **S2**: Contains Sentinel-2 images.
  - **pre**: Pre-fire Sentinel-2 images (high resolution).
  - **post**: Post-fire Sentinel-2 images (high resolution).

- **mask**: Contains NBAC polygons, which serve as ground truth masks for the burned areas.
  - **pre**: Burned area labels from the year before the fire, using the same spatial bounds as the fire events of the current year.
  - **post**: Burned area labels corresponding to post-fire conditions.

- **MODIS**: Contains post-fire MODIS images (lower resolution).

- **LULC**: Contains land use/land cover data from ESRI Sentinel-2 10-Meter Land Use/Land Cover (2017-2023).

- **Daymet**: Contains weather data from Daymet V4: Daily Surface Weather and Climatological Summaries.

### File Naming Convention

Each GeoTIFF (.tif) file is named according to the format: `CA_<year>_<province>_<id>.tif`, where:
- `CA` stands for Canada.
- `<year>` is the year of the wildfire event.
- `<province>` is the province code (e.g., AB for Alberta, BC for British Columbia).
- `<id>` is a unique identifier for the wildfire event.

### Directory Structure

The dataset is organized as follows:

```
FireSR/

├── dataset/
│   ├── S2/
│   │   ├── post/
│   │   │   ├── CA_2017_AB_204.tif
│   │   │   ├── CA_2017_AB_2418.tif
│   │   │   └── ...
│   │   ├── pre/
│   │   │   ├── CA_2017_AB_204.tif
│   │   │   ├── CA_2017_AB_2418.tif
│   │   │   └── ...
│   ├── mask/
│   │   ├── post/
│   │   │   ├── CA_2017_AB_204.tif
│   │   │   ├── CA_2017_AB_2418.tif
│   │   │   └── ...
│   │   ├── pre/
│   │   │   ├── CA_2017_AB_204.tif
│   │   │   ├── CA_2017_AB_2418.tif
│   │   │   └── ...
│   ├── MODIS/
│   │   ├── CA_2017_AB_204.tif
│   │   ├── CA_2017_AB_2418.tif
│   │   └── ...
│   ├── LULC/
│   │   ├── CA_2017_AB_204.tif
│   │   ├── CA_2017_AB_2418.tif
│   │   └── ...
│   ├── Daymet/
│   │   ├── CA_2017_AB_204.tif
│   │   ├── CA_2017_AB_2418.tif
│   │   └── ...
```

### Spatial Resolution and Channels

- **Sentinel-2 (S2) Images**: 20 meters (Bands: B12, B8, B4)
- **MODIS Images**: 250 meters (Bands: B7, B2, B1)
- **NBAC Burned Area Labels**: 20 meters (1 channel, binary classification: burned/unburned)
- **Daymet Weather Data**: 1000 meters (7 channels: dayl, prcp, srad, swe, tmax, tmin, vp)
- **ESRI Land Use/Land Cover Data**: 10 meters (1 channel with 9 classes: water, trees, flooded vegetation, crops, built area, bare ground, snow/ice, clouds, rangeland)

**Daymet Weather Data**: The Daymet dataset includes seven channels that provide various weather-related parameters, which are crucial for understanding and modeling wildfire conditions:

| Name | Units | Min | Max | Description |

|------|-------|-----|-----|-------------|

| dayl | seconds | 0 | 86400 | Duration of the daylight period, based on the period of the day during which the sun is above a hypothetical flat horizon. |

| prcp | mm | 0 | 544 | Daily total precipitation, sum of all forms converted to water-equivalent. |

| srad | W/m^2 | 0 | 1051 | Incident shortwave radiation flux density, averaged over the daylight period of the day. |

| swe | kg/m^2 | 0 | 13931 | Snow water equivalent, representing the amount of water contained within the snowpack. |

| tmax | °C | -60 | 60 | Daily maximum 2-meter air temperature. |

| tmin | °C | -60 | 42 | Daily minimum 2-meter air temperature. |

| vp | Pa | 0 | 8230 | Daily average partial pressure of water vapor. |

**ESRI Land Use/Land Cover Data**: The ESRI 10m Annual Land Cover dataset provides a time series of global maps of land use and land cover (LULC) from 2017 to 2023 at a 10-meter resolution. These maps are derived from ESA Sentinel-2 imagery and are generated by Impact Observatory using a deep learning model trained on billions of human-labeled pixels. Each map is a composite of LULC predictions for 9 classes throughout the year, offering a representative snapshot of each year.

| Class Value | Land Cover Class |

|-------------|------------------|

| 1 | Water |

| 2 | Trees |

| 4 | Flooded Vegetation |

| 5 | Crops |

| 7 | Built Area |

| 8 | Bare Ground |

| 9 | Snow/Ice |

| 10 | Clouds |

| 11 | Rangeland |


## Usage Tutorial

To help users get started with FireSR, we provide a comprehensive tutorial with scripts for data extraction and processing. Below is an example workflow:

### Step 1: Extract FireSR.tar.gz

```bash
tar -xvf FireSR.tar.gz
```

### Step 2: Tiling the GeoTIFF Files

The dataset contains high-resolution GeoTIFF files. For machine learning models, it may be useful to tile these images into smaller patches. Here's a Python script to tile the images:

```python
import rasterio
from rasterio.windows import Window
import os

def tile_image(image_path, output_dir, tile_size=128):
    with rasterio.open(image_path) as src:
        for i in range(0, src.height, tile_size):
            for j in range(0, src.width, tile_size):
                window = Window(j, i, tile_size, tile_size)
                transform = src.window_transform(window)
                outpath = os.path.join(output_dir, f"{os.path.basename(image_path).split('.')[0]}_{i}_{j}.tif")
                with rasterio.open(outpath, 'w', driver='GTiff', height=tile_size, width=tile_size, count=src.count, dtype=src.dtypes[0], crs=src.crs, transform=transform) as dst:
                    dst.write(src.read(window=window))

# Example usage
tile_image('FireSR/dataset/S2/post/CA_2017_AB_204.tif', 'tiled_images/')
```

### Step 3: Loading Data into a Machine Learning Model

After tiling, the images can be loaded into a machine learning model using libraries like PyTorch or TensorFlow. Here's an example using PyTorch:

```python
import torch
from torch.utils.data import Dataset
from torchvision import transforms
import rasterio

class FireSRDataset(Dataset):
    def __init__(self, image_dir, transform=None):
        self.image_dir = image_dir
        self.transform = transform
        self.image_paths = [os.path.join(image_dir, f) for f in os.listdir(image_dir) if f.endswith('.tif')]

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        image_path = self.image_paths[idx]
        with rasterio.open(image_path) as src:
            image = src.read()
        if self.transform:
            image = self.transform(image)
        return image

# Example usage
dataset = FireSRDataset('tiled_images/', transform=transforms.ToTensor())
dataloader = torch.utils.data.DataLoader(dataset, batch_size=16, shuffle=True)
```

## License

This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share and adapt the material as long as appropriate credit is given.

## Contact

For any questions or further information, please contact:
- Name: Eric Brune
- Email: ebrune@kth.se

Files

Files (73.4 GB)

Name Size Download all
md5:8f03c00e661cbf1d47e0a496fdca2558
73.4 GB Download

Additional details

Dates

Submitted
2024-06-05
Submitted to NeurIPS 2024 Datasets and Benchmarks Track