Published March 17, 2025 | Version v1

Songdo Vision: Vehicle Annotations from High-Altitude BeV Drone Imagery in a Smart City

  • 1. ROR icon École Polytechnique Fédérale de Lausanne
  • 2. ROR icon Korea Advanced Institute of Science and Technology

Description

Overview

The Songdo Vision dataset provides high-resolution (4K, 3840×2160 pixels) RGB images annotated with axis-aligned bounding boxes for vehicle detection from a high-altitude bird's-eye view (BEV) perspective.

It comprises 5,419 annotated video frames (4,335 training / 1,084 test; 80/20 split) containing 272,435 vehicle instances across four categories:

  • Car (including vans and light-duty vehicles)
  • Bus
  • Truck
  • Motorcycle

Frames were sampled from footage collected during a multi-drone experiment in Songdo International Business District, South Korea (October 4–7, 2022), using a fleet of 10 DJI Mavic 3 drones flying at 140–150 m altitude (29.97 FPS), primarily covering 20 busy intersections and some roads in between.

Annotations are provided in three widely-supported formats: COCO JSON, YOLO TXT, and Pascal VOC XML. A full dataset card with usage instructions and download guidance is available on Hugging Face.

📌 Citation: If you use this dataset in your work, kindly acknowledge it by citing the following article:

Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery, Transportation Research Part C: Emerging Technologies, vol. 178, 105205. DOI: 10.1016/j.trc.2025.105205.

🔗 Related dataset: For precisely georeferenced vehicle trajectories extracted from the same large-scale multi-drone experiment, see Songdo Traffic: 10.5281/zenodo.13828384.

Motivation

Publicly available datasets for aerial vehicle detection often exhibit limitations such as:

  • Non-BeV perspectives with varying angles and distortions
  • Inconsistent annotation quality, with loose or missing bounding boxes
  • Lower-resolution imagery, reducing detection accuracy, particularly for smaller vehicles
  • Lack of annotation detail, especially for motorcycles in dense urban scenes with complex backgrounds

To address these challenges, Songdo Vision provides high-quality human-annotated bounding boxes, with machine learning assistance used to enhance efficiency and consistency. This ensures accurate and reliable ground truth for training and evaluating detection models.

Dataset Composition

The dataset is randomly split into training (80%) and test (20%) subsets:

Subset Images Car Bus Truck Motorcycle Total Vehicles
Train 4,335 195,539 7,030 11,779 2,963 217,311
Test 1,084 49,508 1,759 3,052 805 55,124

 

A subset of 5,274 frames was randomly sampled from drone video sequences, while an additional 145 frames were carefully selected to represent challenging cases, such as motorcycles at pedestrian crossings, in bicycle lanes, near traffic light poles, and around other distinctive road markers where they may blend into the urban environment.

Data Collection

The dataset was collected as part of a collaborative multi-drone experiment conducted by KAIST and EPFL in Songdo, South Korea, from October 4–7, 2022.

  • A fleet of 10 drones monitored 20 busy intersections, executing advanced flight plans to optimize coverage.
  • 4K (3840×2160) RGB video footage was recorded at 29.97 FPS from altitudes of 140–150 meters.
  • Each drone flew 10 sessions per day, covering peak morning and afternoon periods.
  • The experiment resulted in 12TB of 4K raw video data.

More details on the experimental setup and data processing pipeline are available in [1].

Bounding Box Annotations & Formats

Annotations were generated using a semi-automated object detection annotation process in Azure ML Studio, leveraging machine learning-assisted bounding box detection with human verification to ensure precision.

Each annotated frame includes categorized, axis-aligned bounding boxes, stored in three widely-used formats:

1. COCO JSON format

  • Single annotation file per dataset subset (i.e., one for training, one for testing).
  • Contains metadata such as image dimensions, bounding box coordinates, and class labels.
  • Example snippet:
{
  "images": [{"id": 1, "file_name": "0001.jpg", "width": 3840, "height": 2160}],
  "annotations": [{"id": 1, "image_id": 1, "category_id": 2, "bbox": [500, 600, 200, 50], "area": 10000, "iscrowd": 0}],
  "categories": [
    {"id": 1, "name": "car"}, {"id": 2, "name": "bus"},
    {"id": 3, "name": "truck"}, {"id": 4, "name": "motorcycle"}
  ]
}

2. YOLO TXT format

  • One annotation file per image, following the format:
<class_id> <x_center> <y_center> <width> <height>
  • Bounding box values are normalized to [0,1], with the origin at the top-left corner.
  • Example snippet:
0 0.52 0.63 0.10 0.05  # Car bounding box
2 0.25 0.40 0.15 0.08  # Truck bounding box

3. Pascal VOC XML format

  • One annotation file per image, structured in XML.
  • Contains image properties and absolute pixel coordinates for each bounding box.
  • Example snippet:
<annotation>
  <filename>0001.jpg</filename>
  <size><width>3840</width><height>2160</height><depth>3</depth></size>
  <object>
    <name>car</name>
    <bndbox><xmin>500</xmin><ymin>600</ymin><xmax>600</xmax><ymax>650</ymax></bndbox>
  </object>
</annotation>

File Structure

The dataset is provided as two compressed archives:

1. Training Data (train.zip, 12.91 GB)

train/
│── coco_annotations.json  # COCO format
│── images/
│   ├── 0001.jpg
│   ├── ...
│── labels/
│   ├── 0001.txt  # YOLO format
│   ├── 0001.xml  # Pascal VOC format
│   ├── ...

2. Testing Data (test.zip, 3.22 GB)

test/
│── coco_annotations.json
│── images/
│   ├── 00027.jpg
│   ├── ...
│── labels/
│   ├── 00027.txt
│   ├── 00027.xml
│   ├── ...

Additional Files

  • README.md – Dataset documentation (this description)
  • LICENSE.txt – Creative Commons Attribution 4.0 License
  • names.txt – Class names (one per line)
  • data.yaml – Example YOLO configuration file for training/testing

Acknowledgments

In addition to the funding sources listed in the metadata, the creators express their gratitude to Artem Vasilev for his dedicated efforts in data annotation. We also thank the research teams of Prof. Simon Oh (Korea University) and Prof. Minju Park (Hannam University) for their assistance during the data collection campaign, including the provision of drone equipment and student support.

Citation & Attribution

Preferred Citation: If you use Songdo Vision for any purpose, whether academic research, commercial applications, open-source projects, or benchmarking efforts, please cite our accompanying article [1]:

Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery, Transportation Research Part C: Emerging Technologies, vol. 178, 105205. DOI: 10.1016/j.trc.2025.105205

BibTeX entry:

@article{fonod2025advanced,
  title = {Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery}, 
  author = {Fonod, Robert and Cho, Haechan and Yeo, Hwasoo and Geroliminis, Nikolas},
journal = {Transportation Research Part C: Emerging Technologies},
volume = {178},
pages = {105205},
year = {2025},
publisher = {Elsevier},
doi = {10.1016/j.trc.2025.105205},
url = {https://doi.org/10.1016/j.trc.2025.105205} }

Dataset Citation (for archival purposes): Although Zenodo automatically provides a formal citation for this dataset (see below), including citation export in various formats such as BibTeX, we kindly request that you reference the above article as the primary source of this work.

Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Songdo Vision: Vehicle Annotations from High-Altitude BeV Drone Imagery in a Smart City (v1). Zenodo. DOI: 10.5281/zenodo.13828408.

Files

LICENSE.txt

Files (16.1 GB)

Name Size
md5:7f7a0bafde49260308ee25f6898f78fe
1.7 kB Download
md5:527dc6cad772ccb187d5bfe5af738204
18.7 kB Preview Download
md5:5c0695704b8c476390bc730f969c8644
24 Bytes Preview Download
md5:c45a4e28094d8b2f9fc9002df41d5550
8.0 kB Preview Download
md5:815776743546439785d3aba36a0c74f6
3.2 GB Preview Download
md5:ae3a8762e728a988593ee51055f7c49d
12.9 GB Preview Download

Additional details

Funding

Board of the Swiss Federal Institutes of Technology
Open Research Data (ORD) Program of the ETH Board
Swiss National Science Foundation
NCCR Automation (phase I) 180545
Innosuisse – Swiss Innovation Agency
CityDronics 101.645 IP-ENG
National Research Foundation of Korea
Grant funded by the Korean government (MSIT) 2022R1A2C1012380

Dates

Collected
2022-10-04/2022-10-07
Collected by a fleet of 10 DJI Mavic 3 drones