Songdo Vision: Vehicle Annotations from High-Altitude BeV Drone Imagery in a Smart City
Authors/Creators
Description
Overview
The Songdo Vision dataset provides high-resolution (4K, 3840×2160 pixels) RGB images annotated with axis-aligned bounding boxes for vehicle detection from a high-altitude bird's-eye view (BEV) perspective.
It comprises 5,419 annotated video frames (4,335 training / 1,084 test; 80/20 split) containing 272,435 vehicle instances across four categories:
- Car (including vans and light-duty vehicles)
- Bus
- Truck
- Motorcycle
Frames were sampled from footage collected during a multi-drone experiment in Songdo International Business District, South Korea (October 4–7, 2022), using a fleet of 10 DJI Mavic 3 drones flying at 140–150 m altitude (29.97 FPS), primarily covering 20 busy intersections and some roads in between.
Annotations are provided in three widely-supported formats: COCO JSON, YOLO TXT, and Pascal VOC XML. A full dataset card with usage instructions and download guidance is available on Hugging Face.
📌 Citation: If you use this dataset in your work, kindly acknowledge it by citing the following article:
Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery, Transportation Research Part C: Emerging Technologies, vol. 178, 105205. DOI: 10.1016/j.trc.2025.105205.
🔗 Related dataset: For precisely georeferenced vehicle trajectories extracted from the same large-scale multi-drone experiment, see Songdo Traffic: 10.5281/zenodo.13828384.
Motivation
Publicly available datasets for aerial vehicle detection often exhibit limitations such as:
- Non-BeV perspectives with varying angles and distortions
- Inconsistent annotation quality, with loose or missing bounding boxes
- Lower-resolution imagery, reducing detection accuracy, particularly for smaller vehicles
- Lack of annotation detail, especially for motorcycles in dense urban scenes with complex backgrounds
To address these challenges, Songdo Vision provides high-quality human-annotated bounding boxes, with machine learning assistance used to enhance efficiency and consistency. This ensures accurate and reliable ground truth for training and evaluating detection models.
Dataset Composition
The dataset is randomly split into training (80%) and test (20%) subsets:
| Subset | Images | Car | Bus | Truck | Motorcycle | Total Vehicles |
| Train | 4,335 | 195,539 | 7,030 | 11,779 | 2,963 | 217,311 |
| Test | 1,084 | 49,508 | 1,759 | 3,052 | 805 | 55,124 |
A subset of 5,274 frames was randomly sampled from drone video sequences, while an additional 145 frames were carefully selected to represent challenging cases, such as motorcycles at pedestrian crossings, in bicycle lanes, near traffic light poles, and around other distinctive road markers where they may blend into the urban environment.
Data Collection
The dataset was collected as part of a collaborative multi-drone experiment conducted by KAIST and EPFL in Songdo, South Korea, from October 4–7, 2022.
- A fleet of 10 drones monitored 20 busy intersections, executing advanced flight plans to optimize coverage.
- 4K (3840×2160) RGB video footage was recorded at 29.97 FPS from altitudes of 140–150 meters.
- Each drone flew 10 sessions per day, covering peak morning and afternoon periods.
- The experiment resulted in 12TB of 4K raw video data.
More details on the experimental setup and data processing pipeline are available in [1].
Bounding Box Annotations & Formats
Annotations were generated using a semi-automated object detection annotation process in Azure ML Studio, leveraging machine learning-assisted bounding box detection with human verification to ensure precision.
Each annotated frame includes categorized, axis-aligned bounding boxes, stored in three widely-used formats:
1. COCO JSON format
- Single annotation file per dataset subset (i.e., one for training, one for testing).
- Contains metadata such as image dimensions, bounding box coordinates, and class labels.
- Example snippet:
{
"images": [{"id": 1, "file_name": "0001.jpg", "width": 3840, "height": 2160}],
"annotations": [{"id": 1, "image_id": 1, "category_id": 2, "bbox": [500, 600, 200, 50], "area": 10000, "iscrowd": 0}],
"categories": [
{"id": 1, "name": "car"}, {"id": 2, "name": "bus"},
{"id": 3, "name": "truck"}, {"id": 4, "name": "motorcycle"}
]
}
2. YOLO TXT format
- One annotation file per image, following the format:
<class_id> <x_center> <y_center> <width> <height>
- Bounding box values are normalized to [0,1], with the origin at the top-left corner.
- Example snippet:
0 0.52 0.63 0.10 0.05 # Car bounding box
2 0.25 0.40 0.15 0.08 # Truck bounding box
3. Pascal VOC XML format
- One annotation file per image, structured in XML.
- Contains image properties and absolute pixel coordinates for each bounding box.
- Example snippet:
<annotation>
<filename>0001.jpg</filename>
<size><width>3840</width><height>2160</height><depth>3</depth></size>
<object>
<name>car</name>
<bndbox><xmin>500</xmin><ymin>600</ymin><xmax>600</xmax><ymax>650</ymax></bndbox>
</object>
</annotation>
File Structure
The dataset is provided as two compressed archives:
1. Training Data (train.zip, 12.91 GB)
train/
│── coco_annotations.json # COCO format
│── images/
│ ├── 0001.jpg
│ ├── ...
│── labels/
│ ├── 0001.txt # YOLO format
│ ├── 0001.xml # Pascal VOC format
│ ├── ...
2. Testing Data (test.zip, 3.22 GB)
test/
│── coco_annotations.json
│── images/
│ ├── 00027.jpg
│ ├── ...
│── labels/
│ ├── 00027.txt
│ ├── 00027.xml
│ ├── ...
Additional Files
README.md– Dataset documentation (this description)LICENSE.txt– Creative Commons Attribution 4.0 Licensenames.txt– Class names (one per line)data.yaml– Example YOLO configuration file for training/testing
Acknowledgments
In addition to the funding sources listed in the metadata, the creators express their gratitude to Artem Vasilev for his dedicated efforts in data annotation. We also thank the research teams of Prof. Simon Oh (Korea University) and Prof. Minju Park (Hannam University) for their assistance during the data collection campaign, including the provision of drone equipment and student support.
Citation & Attribution
Preferred Citation: If you use Songdo Vision for any purpose, whether academic research, commercial applications, open-source projects, or benchmarking efforts, please cite our accompanying article [1]:
Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery, Transportation Research Part C: Emerging Technologies, vol. 178, 105205. DOI: 10.1016/j.trc.2025.105205
BibTeX entry:
@article{fonod2025advanced, title = {Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery}, author = {Fonod, Robert and Cho, Haechan and Yeo, Hwasoo and Geroliminis, Nikolas},
journal = {Transportation Research Part C: Emerging Technologies},
volume = {178},
pages = {105205},
year = {2025},
publisher = {Elsevier},
doi = {10.1016/j.trc.2025.105205},
url = {https://doi.org/10.1016/j.trc.2025.105205} }
Dataset Citation (for archival purposes): Although Zenodo automatically provides a formal citation for this dataset (see below), including citation export in various formats such as BibTeX, we kindly request that you reference the above article as the primary source of this work.
Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Songdo Vision: Vehicle Annotations from High-Altitude BeV Drone Imagery in a Smart City (v1). Zenodo. DOI: 10.5281/zenodo.13828408.
Files
LICENSE.txt
Files
(16.1 GB)
| Name | Size | |
|---|---|---|
|
md5:7f7a0bafde49260308ee25f6898f78fe
|
1.7 kB | Download |
|
md5:527dc6cad772ccb187d5bfe5af738204
|
18.7 kB | Preview Download |
|
md5:5c0695704b8c476390bc730f969c8644
|
24 Bytes | Preview Download |
|
md5:c45a4e28094d8b2f9fc9002df41d5550
|
8.0 kB | Preview Download |
|
md5:815776743546439785d3aba36a0c74f6
|
3.2 GB | Preview Download |
|
md5:ae3a8762e728a988593ee51055f7c49d
|
12.9 GB | Preview Download |
Additional details
Funding
- Board of the Swiss Federal Institutes of Technology
- Open Research Data (ORD) Program of the ETH Board
- Swiss National Science Foundation
- NCCR Automation (phase I) 180545
- Innosuisse – Swiss Innovation Agency
- CityDronics 101.645 IP-ENG
- National Research Foundation of Korea
- Grant funded by the Korean government (MSIT) 2022R1A2C1012380
Dates
- Collected
-
2022-10-04/2022-10-07Collected by a fleet of 10 DJI Mavic 3 drones