A Priority Map for Vision-Language Navigation - Datasets
- 1. University of Zurich
- 2. University of Cambridge
Description
This archive contains full versions of the datasets and additional data presented in the following paper:
A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues
A priority map module (PM-VLN) boosts the performance of transformer-based architectures in navigation tasks by combining temporal sequence alignment and feature-level localisation in cross-modal inputs. The module is pretrained on trajectory estimation and a multi-objective task that pairs location estimation with cross-modal sentence prediction. Two datasets are introduced for the auxiliary tasks:
- TR-NY-PIT-central - a set of path traces for routes in two urban locations.
- MC-10 - a set of samples with multimodal inputs representing landmarks in 10 US cities.
Full details and links for this research are available at the following link:
https://jasonarmitage-res.github.io/projects/priority_map/
Additional data comprising path traces for routes in Manhattan and language tokens for the Touchdown task are provided for training and evaluating the PM-VLN and framework on the Touchdown benchmark. Please refer to the following link for details on the Touchdown dataset and StreetLearn environment:
https://sites.google.com/view/streetlearn/touchdown
Notes
Files
datasets_details.pdf
Files
(362.8 MB)
Name | Size | Download all |
---|---|---|
md5:0f5ff99e9cf85de706fff25f0463dfa1
|
387.6 kB | Preview Download |
md5:1ffc1bc379ae854b2f0557ce2f89d550
|
362.4 MB | Download |