Open Vocabulary Attribute Detection (OVAD) Dataset
Authors/Creators
Description
Introduction
Current detection datasets usually contain various object annotations. Compared to that, there are few detection dataset contains attribute annotations, which is also important for the task of detection. To address this gap, we propose a novel attribute dataset, OVAD, to support training and testing attribute detection comprehensively. OVAD is built on the nuScenes dataset (license: CC BY-NC-SA 4.0), supplementing it with detailed attribute annotations capturing spatial relationships, motion states, and interactions between objects. It is useful for developing and evaluating systems needing to know complex scene dynamics.
To encourage more follow up works on Open Vocabulary Attribute Detection, we are publicly releasing the dataset split used in our paper ("Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes" by Xinhao Xiang, Kuan-Chuan Peng, Suhas Lohit, Michael J. Jones, Jiawei Zhang, BMVC 2025).
Files in the unzipped folder:
OVAD
|---./README.md: This Markdown file
|---OVAD_full: The OVAD dataset
|---|---OVAD_infos_test.pkl
|---|---OVAD_infos_train.pkl
|---|---OVAD_infos_val.pkl
|---OVAD_mini: The mini OVAD dataset
|---|---OVAD_mini_infos_val.pkl
|---|---OVAD_mini_infos_train.pkl
|---|---example_val_0.json
At a Glance
The size of the unzipped dataset is ~2.5GB.
OVAD is build on the nuScenes dataset (license: CC BY-NC-SA 4.0). Please download the dataset from their original repository.
The .pkl files contains the meta information and the data list. It is organized as follows:
1. Metadata
- Type:
dict - Content:
version:<class 'str'>
2. Infos
- Type:
listofdict - Each entry store a sample information, it contains:
| Key | Type | Description |
|---|---|---|
lidar_path |
<class 'str'> |
Path to the LiDAR data. |
num_features |
<class 'int'> |
Number of features in the data. |
token |
<class 'str'> |
Unique identifier for the sample. |
sweeps |
<class 'list'> |
List of previous LiDAR frames. |
cams |
<class 'dict'> |
Camera-related information. |
lidar2ego_translation |
<class 'list'> |
Translation from LiDAR to ego-frame. |
lidar2ego_rotation |
<class 'list'> |
Rotation from LiDAR to ego-frame. |
ego2global_translation |
<class 'list'> |
Translation from ego-frame to global. |
ego2global_rotation |
<class 'list'> |
Rotation from ego-frame to global. |
timestamp |
<class 'int'> |
Timestamp of the sample. |
gt_spatial_boxes |
np.ndarray (num_spat, 7) |
Spatial box information. |
gt_spatial_names |
np.ndarray (num_spat,) |
Spatial relationship names. |
gt_boxes |
np.ndarray (num_obj, 7) |
Ground truth 3D boxes. |
gt_names |
np.ndarray (num_obj,) |
Object category names. |
gt_attribute_names |
<class 'list'> |
Attribute names for each object. |
gt_velocity |
np.ndarray (num_obj, 2) |
Object velocities on x and y axises |
num_lidar_pts |
np.ndarray (num_obj,) |
Number of LiDAR points per object. |
num_radar_pts |
np.ndarray (num_obj,) |
Number of radar points per object. |
valid_flag |
np.ndarray (num_obj,) |
Validity flag for objects. |
More information for those keys not related to open vocabulary attribute detection could be found in MMdetection3d (license: Apache 2.0).
Example Representation
{
"metadata": {
"version": "v1.0"
},
"infos": [
{
"lidar_path": "path/to/lidar/file.bin",
"num_features": 5,
"token": "37091c75b9704e0daa829ba56dfa0906",
"sweeps": [...],
"cams": {...},
"lidar2ego_translation": [...],
"lidar2ego_rotation": [...],
"ego2global_translation": [...],
"ego2global_rotation": [...],
"timestamp": 1533201470427893,
"gt_spatial_boxes": [[...], [...], ...],
"gt_spatial_names": ["From the perspective of pedestrian, car is behind pedestrian", "...", "..."],
"gt_boxes": [[...], [...], ...],
"gt_names": ["car", "pedestrian", ...],
"gt_attribute_names": [["cycle.with_rider"], ["pedestrian.standing"], ...],
"gt_velocity": [[0.0, 1.2], [1.1, -0.5], ...],
"num_lidar_pts": [12, 8, ...],
"num_radar_pts": [5, 3, ...],
"valid_flag": [True, False, ...]
},
{...}
]
}
The 'example_val_0.json' file shows the comprehensive example of the one entry data under the "infos" key. It is the first entry data in the validation set.
Citation
If you use the OVAD dataset in your research, please cite our paper:
@inproceedings{yang2025ltoad,
author = {Xiang, Xinhao and Peng, Kuan-Chuan and Lohit, Suhas and Jones, Michael J. and Zhang, Jiawei},
title = {Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes},
booktitle = {The British Machine Vision Conference (BMVC)},
year = {2025}
}
License
The OVAD dataset is released under CC-BY-NC-SA-4.0 license. For the images in the nuScenes dataset, please refer to their website for their copyright and license terms.
Created by Mitsubishi Electric Research Laboratories (MERL), 2024-2025
SPDX-License-Identifier: CC-BY-NC-SA-4.0
Files
OVAD.zip
Files
(438.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:0a71b42528d7b8044aa3996688c26a02
|
438.0 MB | Preview Download |