Toloka Visual Question Answering Dataset
Contributors
- 1. Toloka
Description
Our dataset consists of the images associated with textual questions. One entry (instance) in our dataset is a question-image pair labeled with the ground truth coordinates of a bounding box containing the visual answer to the given question. The images were obtained from a CC BY-licensed subset of the Microsoft Common Objects in Context dataset, MS COCO. All data labeling was performed on the Toloka crowdsourcing platform, https://toloka.ai/.
Our dataset has 45,199 instances split among three subsets: train (38,990 instances), public test (1,705 instances), and private test (4,504 instances). The entire train dataset will be available for everyone since the start of the challenge. The public test dataset will be available since the evaluation phase of the competition, but without any ground truth labels. The private test dataset will not be available until the challenge ends.
The datasets will be provided as files in the comma-separated values (CSV) format containing the following columns.
Column | Type | Description |
image | string | URL of an image on a public content delivery network |
width | integer | image width |
height | integer | image height |
left | integer | bounding box coordinate: left |
top | integer | bounding box coordinate: top |
right | integer | bounding box coordinate: right |
bottom | integer | bounding box coordinate: bottom |
question | string | question in English |
This upload also contains a ZIP file with the images from MS COCO.
Files
train.csv
Additional details
Related works
- Is compiled by
- Other: https://toloka.ai/research/ (URL)
- Is supplement to
- Project deliverable: https://toloka.ai/challenges/wsdm2023/ (URL)
- Software: https://github.com/Toloka/WSDMCup2023 (URL)
- Dataset: https://cocodataset.org/ (URL)
- Project deliverable: https://codalab.lisn.upsaclay.fr/competitions/7434 (URL)