FireSR: A Dataset for Super-Resolution and Segmentation of Burned Areas
Contributors
Data collector:
Description
# FireSR Dataset
## Overview
**FireSR** is a dataset designed for the super-resolution and segmentation of wildfire-burned areas. It includes data for all wildfire events in Canada from 2017 to 2023 that exceed 2000 hectares in size, as reported by the National Burned Area Composite (NBAC). The dataset aims to support high-resolution daily monitoring and improve wildfire management using machine learning techniques.
## Dataset Structure
The dataset is organized into several directories, each containing data relevant to different aspects of wildfire monitoring:
- **S2**: Contains Sentinel-2 images.
- **pre**: Pre-fire Sentinel-2 images (high resolution).
- **post**: Post-fire Sentinel-2 images (high resolution).
- **mask**: Contains NBAC polygons, which serve as ground truth masks for the burned areas.
- **pre**: Burned area labels from the year before the fire, using the same spatial bounds as the fire events of the current year.
- **post**: Burned area labels corresponding to post-fire conditions.
- **MODIS**: Contains post-fire MODIS images (lower resolution).
- **LULC**: Contains land use/land cover data from ESRI Sentinel-2 10-Meter Land Use/Land Cover (2017-2023).
- **Daymet**: Contains weather data from Daymet V4: Daily Surface Weather and Climatological Summaries.
### File Naming Convention
Each GeoTIFF (.tif) file is named according to the format: `CA_<year>_<province>_<id>.tif`, where:
- `CA` stands for Canada.
- `<year>` is the year of the wildfire event.
- `<province>` is the province code (e.g., AB for Alberta, BC for British Columbia).
- `<id>` is a unique identifier for the wildfire event.
### Directory Structure
The dataset is organized as follows:
```
FireSR/
│
├── dataset/
│ ├── S2/
│ │ ├── post/
│ │ │ ├── CA_2017_AB_204.tif
│ │ │ ├── CA_2017_AB_2418.tif
│ │ │ └── ...
│ │ ├── pre/
│ │ │ ├── CA_2017_AB_204.tif
│ │ │ ├── CA_2017_AB_2418.tif
│ │ │ └── ...
│ ├── mask/
│ │ ├── post/
│ │ │ ├── CA_2017_AB_204.tif
│ │ │ ├── CA_2017_AB_2418.tif
│ │ │ └── ...
│ │ ├── pre/
│ │ │ ├── CA_2017_AB_204.tif
│ │ │ ├── CA_2017_AB_2418.tif
│ │ │ └── ...
│ ├── MODIS/
│ │ ├── CA_2017_AB_204.tif
│ │ ├── CA_2017_AB_2418.tif
│ │ └── ...
│ ├── LULC/
│ │ ├── CA_2017_AB_204.tif
│ │ ├── CA_2017_AB_2418.tif
│ │ └── ...
│ ├── Daymet/
│ │ ├── CA_2017_AB_204.tif
│ │ ├── CA_2017_AB_2418.tif
│ │ └── ...
```
### Spatial Resolution and Channels
- **Sentinel-2 (S2) Images**: 20 meters (Bands: B12, B8, B4)
- **MODIS Images**: 250 meters (Bands: B7, B2, B1)
- **NBAC Burned Area Labels**: 20 meters (1 channel, binary classification: burned/unburned)
- **Daymet Weather Data**: 1000 meters (7 channels: dayl, prcp, srad, swe, tmax, tmin, vp)
- **ESRI Land Use/Land Cover Data**: 10 meters (1 channel with 9 classes: water, trees, flooded vegetation, crops, built area, bare ground, snow/ice, clouds, rangeland)
**Daymet Weather Data**: The Daymet dataset includes seven channels that provide various weather-related parameters, which are crucial for understanding and modeling wildfire conditions:
| Name | Units | Min | Max | Description |
|------|-------|-----|-----|-------------|
| dayl | seconds | 0 | 86400 | Duration of the daylight period, based on the period of the day during which the sun is above a hypothetical flat horizon. |
| prcp | mm | 0 | 544 | Daily total precipitation, sum of all forms converted to water-equivalent. |
| srad | W/m^2 | 0 | 1051 | Incident shortwave radiation flux density, averaged over the daylight period of the day. |
| swe | kg/m^2 | 0 | 13931 | Snow water equivalent, representing the amount of water contained within the snowpack. |
| tmax | °C | -60 | 60 | Daily maximum 2-meter air temperature. |
| tmin | °C | -60 | 42 | Daily minimum 2-meter air temperature. |
| vp | Pa | 0 | 8230 | Daily average partial pressure of water vapor. |
**ESRI Land Use/Land Cover Data**: The ESRI 10m Annual Land Cover dataset provides a time series of global maps of land use and land cover (LULC) from 2017 to 2023 at a 10-meter resolution. These maps are derived from ESA Sentinel-2 imagery and are generated by Impact Observatory using a deep learning model trained on billions of human-labeled pixels. Each map is a composite of LULC predictions for 9 classes throughout the year, offering a representative snapshot of each year.
| Class Value | Land Cover Class |
|-------------|------------------|
| 1 | Water |
| 2 | Trees |
| 4 | Flooded Vegetation |
| 5 | Crops |
| 7 | Built Area |
| 8 | Bare Ground |
| 9 | Snow/Ice |
| 10 | Clouds |
| 11 | Rangeland |
## Usage Tutorial
To help users get started with FireSR, we provide a comprehensive tutorial with scripts for data extraction and processing. Below is an example workflow:
### Step 1: Extract FireSR.tar.gz
```bash
tar -xvf FireSR.tar.gz
```
### Step 2: Tiling the GeoTIFF Files
The dataset contains high-resolution GeoTIFF files. For machine learning models, it may be useful to tile these images into smaller patches. Here's a Python script to tile the images:
```python
import rasterio
from rasterio.windows import Window
import os
def tile_image(image_path, output_dir, tile_size=128):
with rasterio.open(image_path) as src:
for i in range(0, src.height, tile_size):
for j in range(0, src.width, tile_size):
window = Window(j, i, tile_size, tile_size)
transform = src.window_transform(window)
outpath = os.path.join(output_dir, f"{os.path.basename(image_path).split('.')[0]}_{i}_{j}.tif")
with rasterio.open(outpath, 'w', driver='GTiff', height=tile_size, width=tile_size, count=src.count, dtype=src.dtypes[0], crs=src.crs, transform=transform) as dst:
dst.write(src.read(window=window))
# Example usage
tile_image('FireSR/dataset/S2/post/CA_2017_AB_204.tif', 'tiled_images/')
```
### Step 3: Loading Data into a Machine Learning Model
After tiling, the images can be loaded into a machine learning model using libraries like PyTorch or TensorFlow. Here's an example using PyTorch:
```python
import torch
from torch.utils.data import Dataset
from torchvision import transforms
import rasterio
class FireSRDataset(Dataset):
def __init__(self, image_dir, transform=None):
self.image_dir = image_dir
self.transform = transform
self.image_paths = [os.path.join(image_dir, f) for f in os.listdir(image_dir) if f.endswith('.tif')]
def __len__(self):
return len(self.image_paths)
def __getitem__(self, idx):
image_path = self.image_paths[idx]
with rasterio.open(image_path) as src:
image = src.read()
if self.transform:
image = self.transform(image)
return image
# Example usage
dataset = FireSRDataset('tiled_images/', transform=transforms.ToTensor())
dataloader = torch.utils.data.DataLoader(dataset, batch_size=16, shuffle=True)
```
## License
This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share and adapt the material as long as appropriate credit is given.
## Contact
For any questions or further information, please contact:
- Name: Eric Brune
- Email: ebrune@kth.se
Files
Files
(73.4 GB)
Name | Size | Download all |
---|---|---|
md5:8f03c00e661cbf1d47e0a496fdca2558
|
73.4 GB | Download |
Additional details
Dates
- Submitted
-
2024-06-05Submitted to NeurIPS 2024 Datasets and Benchmarks Track