Open Vocabulary Attribute Detection (OVAD) Dataset

Xiang, Xinhao; Peng, Kuan-Chuan; Lohit, Suhas; Jones, Michael; Zhang, Jiawei

doi:10.5281/zenodo.16904070

Published August 19, 2025 | Version v1

Dataset Open

Open Vocabulary Attribute Detection (OVAD) Dataset

1. University of California, Davis
2. Mitsubishi Electric Research Laboratories
3. Mitsubishi Electric Research Laboratories (United States)

Introduction

Current detection datasets usually contain various object annotations. Compared to that, there are few detection dataset contains attribute annotations, which is also important for the task of detection. To address this gap, we propose a novel attribute dataset, OVAD, to support training and testing attribute detection comprehensively. OVAD is built on the nuScenes dataset (license: CC BY-NC-SA 4.0), supplementing it with detailed attribute annotations capturing spatial relationships, motion states, and interactions between objects. It is useful for developing and evaluating systems needing to know complex scene dynamics.

To encourage more follow up works on Open Vocabulary Attribute Detection, we are publicly releasing the dataset split used in our paper ("Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes" by Xinhao Xiang, Kuan-Chuan Peng, Suhas Lohit, Michael J. Jones, Jiawei Zhang, BMVC 2025).

Files in the unzipped folder:

OVAD
|---./README.md: This Markdown file
|---OVAD_full: The OVAD dataset
|---|---OVAD_infos_test.pkl
|---|---OVAD_infos_train.pkl
|---|---OVAD_infos_val.pkl
|---OVAD_mini: The mini OVAD dataset
|---|---OVAD_mini_infos_val.pkl
|---|---OVAD_mini_infos_train.pkl
|---|---example_val_0.json

At a Glance

The size of the unzipped dataset is ~2.5GB.

OVAD is build on the nuScenes dataset (license: CC BY-NC-SA 4.0). Please download the dataset from their original repository.

The .pkl files contains the meta information and the data list. It is organized as follows:

1. Metadata

Type: dict
Content:
- version: <class 'str'>

2. Infos

Type: list of dict
Each entry store a sample information, it contains:

Key	Type	Description
`lidar_path`	`<class 'str'>`	Path to the LiDAR data.
`num_features`	`<class 'int'>`	Number of features in the data.
`token`	`<class 'str'>`	Unique identifier for the sample.
`sweeps`	`<class 'list'>`	List of previous LiDAR frames.
`cams`	`<class 'dict'>`	Camera-related information.
`lidar2ego_translation`	`<class 'list'>`	Translation from LiDAR to ego-frame.
`lidar2ego_rotation`	`<class 'list'>`	Rotation from LiDAR to ego-frame.
`ego2global_translation`	`<class 'list'>`	Translation from ego-frame to global.
`ego2global_rotation`	`<class 'list'>`	Rotation from ego-frame to global.
`timestamp`	`<class 'int'>`	Timestamp of the sample.
`gt_spatial_boxes`	`np.ndarray` `(num_spat, 7)`	Spatial box information.
`gt_spatial_names`	`np.ndarray` `(num_spat,)`	Spatial relationship names.
`gt_boxes`	`np.ndarray` `(num_obj, 7)`	Ground truth 3D boxes.
`gt_names`	`np.ndarray` `(num_obj,)`	Object category names.
`gt_attribute_names`	`<class 'list'>`	Attribute names for each object.
`gt_velocity`	`np.ndarray` `(num_obj, 2)`	Object velocities on x and y axises
`num_lidar_pts`	`np.ndarray` `(num_obj,)`	Number of LiDAR points per object.
`num_radar_pts`	`np.ndarray` `(num_obj,)`	Number of radar points per object.
`valid_flag`	`np.ndarray` `(num_obj,)`	Validity flag for objects.

More information for those keys not related to open vocabulary attribute detection could be found in MMdetection3d (license: Apache 2.0).

Example Representation

{
    "metadata": {
        "version": "v1.0"
    },
    "infos": [
        {
            "lidar_path": "path/to/lidar/file.bin",
            "num_features": 5,
            "token": "37091c75b9704e0daa829ba56dfa0906",
            "sweeps": [...],
            "cams": {...},
            "lidar2ego_translation": [...],
            "lidar2ego_rotation": [...],
            "ego2global_translation": [...],
            "ego2global_rotation": [...],
            "timestamp": 1533201470427893,
            "gt_spatial_boxes": [[...], [...], ...],
            "gt_spatial_names": ["From the perspective of pedestrian, car is behind pedestrian", "...", "..."],
            "gt_boxes": [[...], [...], ...],
            "gt_names": ["car", "pedestrian", ...],
            "gt_attribute_names": [["cycle.with_rider"], ["pedestrian.standing"], ...],
            "gt_velocity": [[0.0, 1.2], [1.1, -0.5], ...],
            "num_lidar_pts": [12, 8, ...],
            "num_radar_pts": [5, 3, ...],
            "valid_flag": [True, False, ...]
        },
        {...}
    ]
}

The 'example_val_0.json' file shows the comprehensive example of the one entry data under the "infos" key. It is the first entry data in the validation set.

Citation

If you use the OVAD dataset in your research, please cite our paper:

@inproceedings{yang2025ltoad,
    author = {Xiang, Xinhao and Peng, Kuan-Chuan and Lohit, Suhas and Jones, Michael J. and Zhang, Jiawei},
    title = {Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes},
    booktitle = {The British Machine Vision Conference (BMVC)},
    year = {2025}
}

License

The OVAD dataset is released under CC-BY-NC-SA-4.0 license. For the images in the nuScenes dataset, please refer to their website for their copyright and license terms.

Created by Mitsubishi Electric Research Laboratories (MERL), 2024-2025

SPDX-License-Identifier: CC-BY-NC-SA-4.0

Files

OVAD.zip

Files (438.0 MB)

Name	Size	Download all
OVAD.zip md5:0a71b42528d7b8044aa3996688c26a02	438.0 MB	Preview Download

	All versions	This version
Views	763	763
Downloads	222	222
Data volume	101.2 GB	101.2 GB

Open Vocabulary Attribute Detection (OVAD) Dataset

Authors/Creators

Description

Introduction

At a Glance

1. Metadata

2. Infos

Example Representation

Citation

License

Files

OVAD.zip

Files (438.0 MB)