Path Mixing VLN dataset
Creators
Description
The Room -to-Room (R2R) dataset consists of human annotated instructions corresponding to the paths in these graphs. Each path consists of a sequence of viewpoints encountered by the agent during navigation. A derived dataset, Fine-Grained R2R (FGR2R) dataset, annotated parts of instructions with corresponding graph edges to obtain a fine-grained dataset. Existing works in VLN have shown that more instruction examples can improve an agent’s performance in previously unseen environments. We generate 162k instruction-trajectory pairs with path lengths between 5m and 20m. The final dataset has on average 7.27 views per path, a mean of 14.4m trajectory length and an average of 82 words per instruction.