COCO, LVIS, Open Images V4 classes mapping


This repository contains a mapping between the classes of COCO, LVIS, and Open Images V4 datasets into a unique set of 1460 classes.

COCO [Lin et al 2014] contains 80 classes, LVIS [gupta2019lvis] contains 1460 classes, Open Images V4 [Kuznetsova et al. 2020] contains 601 classes.

We built a mapping of these classes using a semi-automatic procedure in order to have a unique final list of 1460 classes. We also generated a hierarchy for each class, using wordnet

This repository contains the following files:

  • coco_classes_map.txt, contains the mapping for the 80 coco classes
  • lvis_classes_map.txt, contains the mapping for the 1460 coco classes
  • openimages_classes_map.txt, contains the mapping for the 601 coco classes
  • classname_hyperset_definition.csv, contains the final set of 1460 classes, their definition and hierarchy
  • all-classnames.xlsx, contains a side-by-side view of all classes considered

This mapping was used in VISIONE [Amato et al. 2021, Amato et al. 2022]  that is a content-based retrieval system that supports various search functionalities (text search, object/color-based search, semantic and visual similarity search, temporal search). For the object detection VISIONE uses three pre-trained models: VfNet [Zhang et al. 2021] (trained on COCO dataset), Mask R-CNN [He et al. 2017] (trained on LVIS), and a Faster R-CNN+Inception ResNet  (trained on the Open Images V4).

This is repository is released under a Creative Commons Attribution license, please cite the following paper if you use it in your work in any form:

  title={The visione video search system: exploiting off-the-shelf text search engines for large-scale video retrieval},
  author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Debole, Franca and Falchi, Fabrizio and Gennaro, Claudio and Vadicamo, Lucia and Vairo, Claudio},
  journal={Journal of Imaging},
  publisher={Multidisciplinary Digital Publishing Institute}




