Published January 15, 2024 | Version v0.3
Dataset Open

Path Mixing VLN dataset

Creators

Description

The Room -to-Room (R2R) dataset consists of human annotated instructions corresponding to the paths in these graphs. Each path consists of a sequence of viewpoints encountered by the agent during navigation. A derived dataset, Fine-Grained R2R (FGR2R) dataset, annotated parts of instructions with corresponding graph edges to obtain a fine-grained dataset. Existing works in VLN have shown that more instruction examples can improve an agent’s performance in previously unseen environments. We generate 162k instruction-trajectory pairs with path lengths between 5m and 20m. The final dataset has on average 7.27 views per path, a mean of 14.4m trajectory length and an average of 82 words per instruction.

Files

augment_train.json

Files (7.4 GB)

Name Size Download all
md5:26b5b4eac16c6dc7c0e72a8d026ad8f7
7.0 GB Preview Download
md5:74a28718bcfb5bc3fc85515d5d7bc84f
2.8 MB Preview Download
md5:a4755df592bb2c6d095c2b321618e142
389.3 MB Preview Download