There is a newer version of the record available.

Published May 12, 2023 | Version 1
Dataset Open

NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research using Knowledge Graphs in Visual Relationship Detection

  • 1. City, University of London

Description

NeSy4VRD

NeSy4VRD is a multifaceted, multipurpose resource designed to foster neurosymbolic AI (NeSy) research, particularly NeSy research using Semantic Web technologies such as OWL ontologies, OWL-based knowledge graphs and OWL-based reasoning as symbolic components. The NeSy4VRD research resource pertains to the computer vision field of AI and, within that field, to the application tasks of visual relationship detection (VRD) and scene graph generation.

Whilst the core motivation of the NeSy4VRD research resource is to foster computer vision-based NeSy research using Semantic Web technologies such as OWL ontologies and OWL-based knowledge graphs, AI researchers can readily use NeSy4VRD to either: 1) pursue computer vision-based NeSy research without involving Semantic Web technologies as symbolic components, or 2) pursue computer vision research without NeSy (i.e. pursue research that focuses purely on deep learning alone, without involving symbolic components of any kind).   This is the sense in which we describe NeSy4VRD as being multipurpose: it can readily be used by diverse groups of computer vision-based AI researchers with diverse interests and objectives.

The NeSy4VRD research resource in its entirety is distributed across two locations: Zenodo and GitHub.

 

NeSy4VRD on Zenodo: the NeSy4VRD dataset package

This entry on Zenodo hosts the NeSy4VRD dataset package, which includes the NeSy4VRD dataset and its companion NeSy4VRD ontology, an OWL ontology called VRD-World.

The NeSy4VRD dataset consists of an image dataset with associated visual relationship annotations. The images of the NeSy4VRD dataset are the same as those that were once publicly available as part of the VRD dataset. The NeSy4VRD visual relationship annotations are a highly customised and quality-improved version of the original VRD visual relationship annotations.  The NeSy4VRD dataset is designed for computer vision-based research that involves detecting objects in images and predicting relationships between ordered pairs of those objects.  A visual relationship for an image of the NeSy4VRD dataset has the form <'subject', 'predicate', 'object'>, where the 'subject' and 'object' are two objects in the image, and the 'predicate' describes some relation between them.  Both the 'subject' and 'object' objects are specified in terms of bounding boxes and object classes.  For example, representative annotated visual relationships are <'person', 'ride', 'horse'>, <'hat', 'on', 'teddy bear'> and <'cat', 'under', 'pillow'>.

Visual relationship detection is pursued as a computer vision application task in its own right, and as a building block capability for the broader application task of scene graph generation.  Scene graph generation, in turn, is commonly used as a precursor to a variety of enriched, downstream visual understanding and reasoning application tasks, such as image captioning, visual question answering, image retrieval, image generation and multimedia event processing.

The NeSy4VRD ontology, VRD-World, is a rich, well-aligned, companion OWL ontology engineered specifically for use with the NeSy4VRD dataset.  It directly describes the domain of the NeSy4VRD dataset, as reflected in the NeSy4VRD visual relationship annotations.  More specifically, all of the object classes that feature in the NeSy4VRD visual relationship annotations have corresponding classes within the VRD-World OWL class hierarchy, and all of the predicates that feature in the NeSy4VRD visual relationship annotations have corresponding properties within the VRD-World OWL object property hierarchy. The rich structure of the VRD-World class hierarchy and the rich characteristics and relationships of the VRD-World object properties together give the VRD-World OWL ontology rich inference semantics. These provide ample opportunity for OWL reasoning to be meaningfully exercised and exploited in NeSy research that uses OWL ontologies and OWL-based knowledge graphs as symbolic components.  There is also ample potential for NeSy researchers to explore supplementing the OWL reasoning capabilities afforded by the VRD-World ontology with Datalog rules and reasoning.

Use of the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset is, of course, purely optional, however.  Computer vision AI researchers who have no interest in NeSy, or NeSy researchers who have no interest in OWL ontologies and OWL-based knowledge graphs, can ignore the NeSy4VRD ontology and use the NeSy4VRD dataset by itself.

All computer vision-based AI research user groups can, if they wish, also avail themselves of the other components of the NeSy4VRD research resource available on GitHub.

 

NeSy4VRD on GitHub: open source infrastructure supporting extensibility, and sample code

The NeSy4VRD research resource incorporates additional components that are companions to the NeSy4VRD dataset package here on Zenodo.  These companion components are available at NeSy4VRD on GitHub. These companion components consist of:

  • comprehensive open source Python-based infrastructure supporting the extensibility of the NeSy4VRD visual relationship annotations (and, thereby, the extensibility of the NeSy4VRD ontology, VRD-World, as well)
  • open source Python sample code showing how one can work with the NeSy4VRD visual relationship annotations in conjunction with the NeSy4VRD ontology, VRD-World, and RDF knowledge graphs.

The NeSy4VRD infrastructure supporting extensibility consists of:

  • open source Python code for conducting deep and comprehensive analyses of the NeSy4VRD dataset (the VRD images and their associated NeSy4VRD visual relationship annotations)
  • an open source, custom-designed NeSy4VRD protocol for specifying visual relationship annotation customisation instructions declaratively, in text files
  • an open source, custom-designed NeSy4VRD workflow, implemented using Python scripts and modules, for applying small or large volumes of customisations or extensions to the NeSy4VRD visual relationship annotations in a configurable, managed, automated and repeatable process.

The purpose behind providing comprehensive infrastructure to support extensibility of the NeSy4VRD visual relationship annotations is to make it easy for researchers to take the NeSy4VRD dataset in new directions, by further enriching the annotations, or by tailoring them to introduce new or more data conditions that better suit their particular research needs and interests.  The option to use the NeSy4VRD extensibility infrastructure in this way applies equally well to each of the diverse potential NeSy4VRD user groups already mentioned.

The NeSy4VRD extensibility infrastructure, however, may be of particular interest to NeSy researchers interested in using the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset. These researchers can of course tailor the VRD-World ontology if they wish without needing to modify or extend the NeSy4VRD visual relationship annotations in any way. But their degrees of freedom for doing so will be limited by the need to maintain alignment with the NeSy4VRD visual relationship annotations and the particular set of object classes and predicates to which they refer.  If NeSy researchers want full freedom to tailor the VRD-World ontology, they may well need to tailor the NeSy4VRD visual relationship annotations first, in order that alignment be maintained.

To illustrate our point, and to illustrate our vision of how the NeSy4VRD extensibility infrastructure can be used, let us consider a simple example. It is common in computer vision to distinguish between thing objects (that have well-defined shapes) and stuff objects (that are amorphous). Suppose a researcher wishes to have a greater number of stuff object classes with which to work.  Water is such a stuff object.  Many VRD images contain water but it is not currently one of the annotated object classes and hence is never referenced in any visual relationship annotations. So adding a Water class to the class hierarchy of the VRD-World ontology would be pointless because it would never acquire any instances (because an object detector would never detect any). However, our hypothetical researcher could choose to do the following:

  • use the analysis functionality of the NeSy4VRD extensibility infrastructure to find images containing water (by, say, searching for images whose visual relationships refer to object classes such as 'boat', 'surfboard', 'sand', 'umbrella', etc.);
  • use free image analysis software (such as GIMP, at gimp.org) to get bounding boxes for instances of water in these images;
  • use the NeSy4VRD protocol to specify new visual relationships for these images that refer to the new 'water' objects (e.g. <'boat', 'on', 'water'>);
  • use the NeSy4VRD workflow to introduce the new object class 'water' and to apply the specified new visual relationships to the sets of annotations for the affected images;
  • introduce class Water to the class hierarchy of the VRD-World ontology (using, say, the free Protege ontology editor);
  • continue experimenting, now with the added benefit of the additional stuff object class 'water';
  • contribute the enriched set of NeSy4VRD visual relationship annotations, and the enriched companion VRD-World ontology, to research communities.

 

Information pertaining to the VRD dataset

Information about the original VRD dataset is available here

Public availability of the VRD images (via information accessible from that location) ceased sometime in the latter part of 2021.  We thank Dr. Ranjay Krishna, one of the principals associated with the VRD dataset, for granting us permission to re-establish the public availability of the VRD images as part of NeSy4VRD.

The original VRD visual relationship annotations are still publicly available from that location.  But our deep analysis of those annotations, driven by our desire to design a robust companion ontology, revealed them to be highly problematic in many ways that made credible ontology modelling infeasible.  They were also found to be replete with all manner of errors.  The NeSy4VRD visual relationship annotations are far superior and we recommend them over the original VRD annotations to anyone contemplating conducting research using the VRD images.  The NeSy4VRD annotations also have the added benefit of the rich, well-aligned companion NeSy4VRD ontology, VRD-World, for those whose research requires such a companion ontology.

Researchers wishing to use the original VRD dataset may still do so. They can access the VRD images here, from within the NeSy4VRD dataset on Zenodo, and access the VRD visual relationship annotations from the location in the link.

A note of caution: the NeSy4VRD ontology, VRD-World, is not compatible with the original VRD visual relationship annotations and cannot be used in conjunction with them.  The VRD-World ontology has been engineered in relation to the highly customised and quality-improved NeSy4VRD visual relationship annotations. The customisations that were applied include ones that introduced many new object classes, merged some of the existing object classes, introduced one new predicate, and changed several predicate names.

However, researchers can, if they wish, use the NeSy4VRD extensibility infrastructure (described above) to undertake their own customisation and quality-improvement exercise with respect to the original VRD visual relationship annotations. This is precisely how the NeSy4VRD visual relationship annotations were created in the first place. The primary intended use case of NeSy4VRD's extensibility infrastructure, however, is for researchers to use the NeSy4VRD visual relationship annotations as their starting point, and to take these annotations forward with onward customisations and extensions, as illustrated in the example use case given above.

 

 

Files

NeSy4VRD.zip

Files (2.0 GB)

Name Size Download all
md5:0a315c78dc494f3b81028e6362d37091
2.0 GB Preview Download