Published March 23, 2026 | Version v1
Dataset Open

Clean-Noisy Aerial Building Segmentation (CN-ABS)

  • 1. ROR icon University of Liverpool
  • 2. INRIA, Montpellier
  • 3. ROR icon Technological University Dublin
  • 4. Micron Agritech
  • 5. ROR icon University of Exeter
  • 6. ROR icon Technische Universität Berlin

Description

This dataset, introduced in the MVEO 2024 Challenge (https://www.codabench.org/competitions/10453/), is built upon the high-resolution SpaceNet8 dataset (https://ieeexplore.ieee.org/document/9857340), which consists of RGB satellite imagery acquired by the WorldView-3 sensor over regions in East Louisiana (USA) and Germany, with spatial resolutions ranging from 0.3 m to 0.8 m per pixel.

However, rather than using the full set of images and classes, this dataset is restricted to pre-flood imagery and includes only building-related classes, framing the problem as a binary semantic segmentation task (building vs background). The related original images are divided into 256x256 pixel patches. From these, 5,000 samples are randomly selected for training, while the remaining 1,298 samples are allocated for validation and testing.

To simulate realistic annotation imperfections, synthetic label noise is introduced into the training segmentation masks after patch extraction. Several types of noise are considered, including:

  • Global shrink/expansion
  • One-sided shrink/expansion
  • Moderate rotation
  • Small translation
  • Deletion
  • Vertex addition
  • False positive addition

These perturbations aim to reflect common sources of annotation errors in real-world remote sensing datasets.

Observe that: (i) no noise is introduced into the validation/testing images, and (ii) the clean and noisy training data is available.

Further details about the dataset and its construction can be found in: https://arxiv.org/abs/2603.00604

Files

data.zip

Files (650.6 MB)

Name Size Download all
md5:809f0239ad4cad6dfd1bfd97feec808f
650.6 MB Preview Download