Dataset Open Access

3DHD CityScenes: High-Definition Maps in High-Density Point Clouds

Plachetka, Christopher; Sertolli, Benjamin; Fricke, Jenny; Klingner, Marvin; Fingscheidt, Tim


3DHD CityScenes is the most comprehensive, large-scale high-definition (HD) map data to data, annotated in the three spatial dimensions of globally referenced, high-density LiDAR point clouds collected in urban domains. Our HD map covers 127 km of road sections of the inner city of Hamburg, Germany including 467 km of individual lanes. In total, our map comprises 266,762 individual items.

The corresponding paper with an example task in the field of 3D object detection will be made at arXiv on October 7th, 2022 (accepted for ITSC 2022).

Moreover, we release code to facilitate the application of our dataset and the reproducibility of our research. Specifically, our 3DHD_DevKit comprises:

  • Python tools to read, generate, and visualize the dataset,
  • 3DHDNet deep learning pipeline (training, inference, evaluation) for
    map deviation detection and 3D object detection.

The DevKit will expectedly be released end of October 2022 on GitHub:

The dataset and DevKit have been created by Christopher Plachetka as project lead during his PhD period at Volkswagen AG, Germany.


We thank the following interns for their exceptional contributions to our work.

  • Benjamin Sertolli: Major contributions to our DevKit during his master thesis
  • Niels Maier: Measurement campaign for data collection and data preparation

The European large-scale project Hi-Drive ( supports the publication of 3DHD CityScenes and encourages the general publication of information and databases facilitating the development of automated driving technologies.

The Dataset

After downloading, the 3DHD_CityScenes folder provides five subdirectories, which are explained briefly in the following.

1. Dataset

This directory contains the training, validation, and test set definition (train.json, val.json, test.json) used in our publications. Respective files contain samples that define a geolocation and the orientation of the ego vehicle in global coordinates on the map.

During dataset generation (done by our DevKit), samples are used to take crops from the larger point cloud. Also, map elements in reach of a sample are collected. Both modalities can then be used, e.g., as input to a neural network such as our 3DHDNet.

To read any JSON-encoded data provided by 3DHD CityScenes in Python, you can use the following code snipped as an example.

import json

json_path = r"E:\3DHD_CityScenes\Dataset\train.json"
with open(json_path) as jf:
    data = json.load(jf)

2. HD_Map

Map items are stored as lists of items in JSON format. In particular, we provide:

  • traffic signs,
  • traffic lights,
  • pole-like objects,
  • construction site locations,
  • construction site obstacles (point-like such as cones, and line-like such as fences),
  • line-shaped markings (solid, dashed, etc.),
  • polygon-shaped markings (arrows, stop lines, symbols, etc.),
  • lanes (ordinary and temporary),
  • relations between elements (only for construction sites, e.g., sign to lane association).

3. HD_Map_MetaData

Our high-density point cloud used as basis for annotating the HD map is split in 648 tiles. This directory contains the geolocation for each tile as polygon on the map. You can view the respective tile definition using QGIS. Alternatively, we also provide respective polygons as lists of UTM coordinates in JSON.

Files with the ending .dbf, .prj, .qpj, .shp, and .shx belong to the tile definition as “shape file” (commonly used in geodesy) that can be viewed using QGIS. The JSON file contains the same information provided in a different format used in our Python API.

4. HD_PointCloud_Tiles

The high-density point cloud tiles are provided in global UTM32N coordinates and encoded in a proprietary binary format. The first 4 bytes (integer) encode the number of points contained in that file. Subsequently, all point cloud values are provided as arrays. First all x-values, then all y-values, and so on. Specifically, the arrays are encoded as follows.

  • x-coordinates: 4 byte integer
  • y-coordinates: 4 byte integer
  • z-coordinates: 4 byte integer
  • intensity of reflected beams: 2 byte unsigned integer
  • ground classification flag: 1 byte unsigned integer

After reading, respective values have to be unnormalized. As an example, you can use the following code snipped to read the point cloud data. For visualization, you can use the pptk package, for instance.

import numpy as np
import pptk

file_path = r"E:\3DHD_CityScenes\HD_PointCloud_Tiles\HH_001.bin"
pc_dict = {}
key_list = ['x', 'y', 'z', 'intensity', 'is_ground']
type_list = ['<i4', '<i4', '<i4', '<u2', 'u1']

with open(file_path, "r") as fid:
    num_points = np.fromfile(fid, count=1, dtype='<u4')[0]
    # print(num_points)

    # Init
    for k, dtype in zip(key_list, type_list):
        pc_dict[k] = np.zeros([num_points], dtype=dtype)

    # Read all arrays
    for k, t in zip(key_list, type_list):
        pc_dict[k] = np.fromfile(fid, count=num_points, dtype=t)

    # Unnorm
    pc_dict['x'] = (pc_dict['x'] / 1000) + 500000
    pc_dict['y'] = (pc_dict['y'] / 1000) + 5000000
    pc_dict['z'] = (pc_dict['z'] / 1000)
    pc_dict['intensity'] = pc_dict['intensity'] / 2**16
    pc_dict['is_ground'] = pc_dict['is_ground'].astype(np.bool_)



# Visualization
# Normalize (due to large UTM values)
x_utm = pc_dict['x'] - np.mean(pc_dict['x'])
y_utm = pc_dict['y'] - np.mean(pc_dict['y'])
z_utm = pc_dict['z']
xyz = np.column_stack((x_utm, y_utm, z_utm))
viewer = pptk.viewer(xyz)

5. Trajectories

We provide 15 real-world trajectories recorded during a measurement campaign covering the whole HD map. Trajectory samples are provided approx. with 30 Hz and are encoded in JSON.

These trajectories were used to provide the samples in train.json, val.json. and test.json with realistic geolocations and orientations of the ego vehicle.

  • OP1 – OP5 cover the majority of the map with 5 trajectories.
  • RH1 – RH10 cover the majority of the map with 10 trajectories.

Note that OP5 is split into three separate parts, a-c. RH9 is split into two parts, a-b. Moreover, OP4 mostly equals OP1 (thus, we speak of 14 trajectories in our paper). For completeness, however, we provide all recorded trajectories here.  


If you use our dataset, you are welcomed to cite:

 author = {Plachetka, C. and Sertolli, B. and Fricke, J. and Klingner, M. and Fingscheidt, T.},
 title = {{3DHD CityScenes:~High-Definition Maps in High-Density Point Clouds}},
 pages = {accepted for publication},
 booktitle = {{Proc. of ITSC}},
 year = {Oct. 2022},
 address = {Macau, China}

Files (12.0 GB)
Name Size
2.0 GB Download
2.0 GB Download
2.0 GB Download
2.0 GB Download
2.0 GB Download
2.0 GB Download
24.0 MB Download
147.1 kB Download
All versions This version
Views 108108
Downloads 107107
Data volume 34.1 GB34.1 GB
Unique views 8888
Unique downloads 7777


Cite as