Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions

Jiang, Huaizu; Ma, Xiaojian; Nie, Weili; Yu, Zhiding; Zhu, Yuke; Anandkumar, Anima

doi:10.5281/zenodo.7079175

Published May 27, 2022 | Version v2

Dataset Open

Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions

1. Northeastern University
2. UCLA
3. NVIDIA
4. UT Austin & NVIDIA
5. Caltech & NVIDIA

A significant gap remains between today's visual pattern recognition models and human-level visual cognition especially when it comes to few-shot learning and compositional reasoning of novel concepts. We introduce Bongard-HOI, a new visual reasoning benchmark that focuses on compositional learning of human-object interactions (HOIs) from natural images. It is inspired by two desirable characteristics from the classical Bongard problems (BPs): 1) few-shot concept learning, and 2) context-dependent reasoning. We carefully curate the few-shot instances with hard negatives, where positive and negative images only disagree on action labels, making mere recognition of object categories insufficient to complete our benchmarks. We also design multiple test sets to systematically study the generalization of visual learning models, where we vary the overlap of the HOI concepts between the training and test sets of few-shot instances, from partial to no overlaps. Bongard-HOI presents a substantial challenge to today's visual recognition models. The state-of-the-art HOI detection model achieves only 62% accuracy on few-shot binary prediction while even amateur human testers on MTurk have 91% accuracy. With the Bongard-HOI benchmark, we hope to further advance research efforts in visual reasoning, especially in holistic perception-reasoning systems and better representation learning.

Files

Files (50.9 GB)

Name	Size	Download all
bongard_hoi_annotations.tar md5:f74a0e25cacf315101e86e3af009c123	207.7 MB	Download
bongard_hoi_images.tar md5:84fd10f8201785b4b11f91d4a140ea62	50.4 GB	Download
hico_faster_rcnn_R_101_DC5_3x_objectness.pkl md5:6e451b206ab9299a6aa18179330e9589	54.6 MB	Download
resnet.tar md5:0eac414a7c848962bd7473dc679d9c04	188.7 MB	Download

	All versions	This version
Views	496	320
Downloads	1,041	1,002
Data volume	58.8 TB	58.6 TB

Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions

Creators

Description

Files

Files (50.9 GB)