Simple Multimodal Algorithmic Reasoning Task Dataset (SMART-101)

doi:10.5281/zenodo.7761800

Published March 23, 2023 | Version v1

Dataset Open

Simple Multimodal Algorithmic Reasoning Task Dataset (SMART-101)

1. Mitsubishi Electric Research Laboratories (MERL)
2. Massachusetts Institute of Technology (MIT)

IntroductionRecent times have witnessed an increasing number of applications of deep neural networks towards solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, etc. Such a dramatic progress raises the question: how generalizable are neural networks in solving problems that demand broad skills? To answer this question, we propose SMART: a Simple Multimodal Algorithmic Reasoning Task (and the associated SMART-101 dataset) for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed specifically for children of younger age (6--8). Our dataset consists of 101 unique puzzles; each puzzle comprises a picture and a question, and their solution needs a mix of several elementary skills, including pattern recognition, algebra, and spatial reasoning, among others. To train deep neural networks, we programmatically augment each puzzle to 2,000 new instances; each instance varied in appearance, associated natural language question, and its solution. To foster research and make progress in our quest for artificial general intelligence, we are publicly releasing our SMART-101 dataset, consisting of the full set of programmatically-generated instances of 101 puzzles and their solutions.

The dataset was introduced in our paper Are Deep Neural Networks SMARTer than Second Graders?.

At a Glance

The size of the unzipped dataset is ~12GB.
The dataset consists of `101` folders (numbered from 1-101); each folder corresponds to one distinct puzzle (root puzzle).
There are 2000 puzzle instances programmatically created for each root puzzle, numbered from 1-2000.
Every root puzzle index (in [1,101]) folder contains: (i) `img/` and (ii) `puzzle_<index>.csv`.
The folder `img/` is the location where the puzzle instance images are stored, and `puzzle_<index>.csv` contains the non-image part of a puzzle. Specifically, a row of `puzzle_<index>.csv` is the following tuple: `<id, Question, image, A, B, C, D, E, Answer>`, where `id` is the puzzle instance id (in [1,2000]), `Question` is the puzzle question associated with the instance, `image` is the name of the image (in `img/` folder) corresponding to this instance `id`, `A, B, C, D, E` are the five answer candidates, and `Answer` is the correct answer to the question.

Other Details
In our paper Are Deep Neural Networks SMARTer than Second Graders?, we provide four different dataset splits for evaluation: (i) Instance Split (IS), (ii) Answer Split (AS), (iii) Puzzle Split (PS), and (iv) Few-shot Split (FS). Below, we provide the details of each split to make fair comparisons to the results reported in our paper.

Puzzle Split (PS)
We use the following root puzzle ids as the `Train` and `Test` sets.

Split	Root Puzzle Id Sets
`Test`	{ 94,95, 96, 97, 98, 99, 101, 61,62, 65, 66,67, 69, 70, 71,72,73,74,75,76,77}
`Train`	{1,2,...,101} \ Test

Evaluation is done on all the `Test` puzzles and their accuracies averaged. For the 'Test' puzzles, we use the instance indices 1701-2000 in the evaluation.

Few-shot Split (FS)

We randomly select `k` number of instances from the `Test` sets (that are used in the PS split above) for training in FS split (e.g., `k=100`). These `k` few-shot samples are taken from instance indices 1-1600 of the respective puzzles and evaluation is conducted on all instance ids from 1701-2000.

Instance Split (IS)

We split the instances under every root puzzle as: Train = 1-1600, Val = 1601-1700, Test = 1701-2000. We train the neural network models using the `Train` split puzzle instances from all the root puzzles together and evaluate on the `Test` split of all puzzles.

Answer Split (AS)

We find the median answer value among all the 2000 instances for every root puzzle and only use this set of the respective instances (with the median answer value) as the `Test` set for evaluation (this set is excluded from the training of the neural networks).

Other Resources

PyTorch code for using the dataset to train deep neural networks is available here.

Contact
Anoop Cherian (cherian@merl.com), Kuan-Chuan Peng (kpeng@merl.com), or Suhas Lohit (slohit@merl.com)

Citation
If you use the SMART-101 dataset in your research, please cite our paper:

@article{cherian2022deep,
  title={Are Deep Neural Networks SMARTer than Second Graders?},
  author={Cherian, Anoop and Peng, Kuan-Chuan and Lohit, Suhas and Smith, Kevin and Tenenbaum, Joshua B},
  journal={arXiv preprint arXiv:2212.09993},
  year={2022}
}

Files

LICENSE.txt

Files (8.4 GB)

Name	Size	Download all
LICENSE.txt md5:1c687a2f5cecb22124f43975bade3983	15.1 kB	Preview Download
README.md md5:f41e5779643682589bc3be8e75342628	4.4 kB	Preview Download
SMART101-release-v1.zip md5:274981e45f840006668baea6dc576e84	8.4 GB	Preview Download
smart101_task.png md5:57b0df68198fc3f4d065c86aefe787b2	238.8 kB	Preview Download

	All versions	This version
Views	2,412	599
Downloads	503	37
Data volume	6.9 TB	292.6 GB

Simple Multimodal Algorithmic Reasoning Task Dataset (SMART-101)

Creators

Description

Files

LICENSE.txt

Files (8.4 GB)