Does incorporating multi-turn reinforcement learning during training improve the nDTW score of vision-language

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20439563

Published May 29, 2026 | Version v1

Report Open

Does incorporating multi-turn reinforcement learning during training improve the nDTW score of vision-language

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

The speaker-follower models have proven to be effective in vision-and-language navigation, where a speaker model is used to synthesize new instructions to augment the training data for a follower navigation model. However, in many of the previous methods, the generated instructions are not directly trained to optimize the performance of the follower. In this paper, we present FOAM, a FOllower-Aware speaker Model that is constantly updated given the follower feedback, so that the generated instructions can be more suitable to the current learning state of the follower. Specifically, we optimize

Research goal: Does incorporating multi-turn reinforcement learning during training improve the nDTW score of vision-language navigation models on RxR-CE compared to single-turn policy gradient methods?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.8/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.8/10.

Files

paper.pdf

Files (83.0 kB)

Name	Size	Download all
paper.pdf md5:0e2d9666334f59d65b8e3164644c9950	83.0 kB	Preview Download

	All versions	This version
Views	4	4
Downloads	1	1
Data volume	83.0 kB	83.0 kB

Does incorporating multi-turn reinforcement learning during training improve the nDTW score of vision-language

Authors/Creators

Description

Notes

Files

paper.pdf

Files (83.0 kB)