Published August 19, 2025 | Version v1
Dataset Open

Open Vocabulary Attribute Detection (OVAD) Dataset

  • 1. ROR icon University of California, Davis
  • 2. Mitsubishi Electric Research Laboratories
  • 3. ROR icon Mitsubishi Electric Research Laboratories (United States)

Description

Introduction

Current detection datasets usually contain various object annotations. Compared to that, there are few detection dataset contains attribute annotations, which is also important for the task of detection. To address this gap, we propose a novel attribute dataset, OVAD, to support training and testing attribute detection comprehensively. OVAD is built on the nuScenes dataset (license: CC BY-NC-SA 4.0), supplementing it with detailed attribute annotations capturing spatial relationships, motion states, and interactions between objects. It is useful for developing and evaluating systems needing to know complex scene dynamics.

To encourage more follow up works on Open Vocabulary Attribute Detection, we are publicly releasing the dataset split used in our paper ("Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes" by Xinhao Xiang, Kuan-Chuan Peng, Suhas Lohit, Michael J. Jones, Jiawei Zhang, BMVC 2025).

Files in the unzipped folder:

OVAD
|---./README.md: This Markdown file
|---OVAD_full: The OVAD dataset
|---|---OVAD_infos_test.pkl
|---|---OVAD_infos_train.pkl
|---|---OVAD_infos_val.pkl
|---OVAD_mini: The mini OVAD dataset
|---|---OVAD_mini_infos_val.pkl
|---|---OVAD_mini_infos_train.pkl
|---|---example_val_0.json 

At a Glance

The size of the unzipped dataset is ~2.5GB.

OVAD is build on the nuScenes dataset (license: CC BY-NC-SA 4.0). Please download the dataset from their original repository.

The .pkl files contains the meta information and the data list. It is organized as follows:

1. Metadata

  • Type: dict
  • Content:
    • version<class 'str'>

2. Infos

  • Type: list of dict
  • Each entry store a sample information, it contains:
Key Type Description
lidar_path <class 'str'> Path to the LiDAR data.
num_features <class 'int'> Number of features in the data.
token <class 'str'> Unique identifier for the sample.
sweeps <class 'list'> List of previous LiDAR frames.
cams <class 'dict'> Camera-related information.
lidar2ego_translation <class 'list'> Translation from LiDAR to ego-frame.
lidar2ego_rotation <class 'list'> Rotation from LiDAR to ego-frame.
ego2global_translation <class 'list'> Translation from ego-frame to global.
ego2global_rotation <class 'list'> Rotation from ego-frame to global.
timestamp <class 'int'> Timestamp of the sample.
gt_spatial_boxes np.ndarray (num_spat, 7) Spatial box information.
gt_spatial_names np.ndarray (num_spat,) Spatial relationship names.
gt_boxes np.ndarray (num_obj, 7) Ground truth 3D boxes.
gt_names np.ndarray (num_obj,) Object category names.
gt_attribute_names <class 'list'> Attribute names for each object.
gt_velocity np.ndarray (num_obj, 2) Object velocities on x and y axises
num_lidar_pts np.ndarray (num_obj,) Number of LiDAR points per object.
num_radar_pts np.ndarray (num_obj,) Number of radar points per object.
valid_flag np.ndarray (num_obj,) Validity flag for objects.

More information for those keys not related to open vocabulary attribute detection could be found in MMdetection3d (license: Apache 2.0).

Example Representation

 
{
    "metadata": {
        "version": "v1.0"
    },
    "infos": [
        {
            "lidar_path": "path/to/lidar/file.bin",
            "num_features": 5,
            "token": "37091c75b9704e0daa829ba56dfa0906",
            "sweeps": [...],
            "cams": {...},
            "lidar2ego_translation": [...],
            "lidar2ego_rotation": [...],
            "ego2global_translation": [...],
            "ego2global_rotation": [...],
            "timestamp": 1533201470427893,
            "gt_spatial_boxes": [[...], [...], ...],
            "gt_spatial_names": ["From the perspective of pedestrian, car is behind pedestrian", "...", "..."],
            "gt_boxes": [[...], [...], ...],
            "gt_names": ["car", "pedestrian", ...],
            "gt_attribute_names": [["cycle.with_rider"], ["pedestrian.standing"], ...],
            "gt_velocity": [[0.0, 1.2], [1.1, -0.5], ...],
            "num_lidar_pts": [12, 8, ...],
            "num_radar_pts": [5, 3, ...],
            "valid_flag": [True, False, ...]
        },
        {...}
    ]
}

The 'example_val_0.json' file shows the comprehensive example of the one entry data under the "infos" key. It is the first entry data in the validation set.

Citation

If you use the OVAD dataset in your research, please cite our paper:

 
@inproceedings{yang2025ltoad,
    author = {Xiang, Xinhao and Peng, Kuan-Chuan and Lohit, Suhas and Jones, Michael J. and Zhang, Jiawei},
    title = {Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes},
    booktitle = {The British Machine Vision Conference (BMVC)},
    year = {2025}
}

License

The OVAD dataset is released under CC-BY-NC-SA-4.0 license. For the images in the nuScenes dataset, please refer to their website for their copyright and license terms.

Created by Mitsubishi Electric Research Laboratories (MERL), 2024-2025

SPDX-License-Identifier: CC-BY-NC-SA-4.0

Files

OVAD.zip

Files (438.0 MB)

Name Size Download all
md5:0a71b42528d7b8044aa3996688c26a02
438.0 MB Preview Download