Published July 23, 2022 | Version 1.0.0
Dataset Open

A Priority Map for Vision-Language Navigation - Datasets

  • 1. University of Zurich
  • 2. University of Cambridge

Description

This archive contains full versions of the datasets and additional data presented in the following paper:

A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues

A priority map module (PM-VLN) boosts the performance of transformer-based architectures in navigation tasks by combining temporal sequence alignment and feature-level localisation in cross-modal inputs. The module is pretrained on trajectory estimation and a multi-objective task that pairs location estimation with cross-modal sentence prediction. Two datasets are introduced for the auxiliary tasks:

 - TR-NY-PIT-central - a set of path traces for routes in two urban locations.

 - MC-10 - a set of samples with multimodal inputs representing landmarks in 10 US cities.

Full details and links for this research are available at the following link:

https://jasonarmitage-res.github.io/projects/priority_map/

Additional data comprising path traces for routes in Manhattan and language tokens for the Touchdown task are provided for training and evaluating the PM-VLN and framework on the Touchdown benchmark. Please refer to the following link for details on the Touchdown dataset and StreetLearn environment:

https://sites.google.com/view/streetlearn/touchdown

Notes

This research is supported by the Digital Visual Studies program at the University of Zurich and funded by the Max Planck Society.

Files

datasets_details.pdf

Files (362.8 MB)

Name Size Download all
md5:0f5ff99e9cf85de706fff25f0463dfa1
387.6 kB Preview Download
md5:1ffc1bc379ae854b2f0557ce2f89d550
362.4 MB Download