Translating speech with just images

Oneață, Dan; Kamper, Herman

doi:10.21437/Interspeech.2024-903

Published November 4, 2024 | Version v1

Conference paper Open

Translating speech with just images

1. POLITEHNICA Bucharest
2. Stellenbosch University

Visually grounded speech models link speech to images. We extend this connection by linking images to text via an existing image captioning system, and as a result gain the ability to map speech audio directly to text. This approach can be used for speech translation with just images by having the audio in a different language from the generated captions. We investigate such a system on a real low-resource language, Yorùbá, and propose a Yorùbá-to-English speech translation model that leverages pretrained components in order to be able to learn in the low-resource regime. To limit overfitting, we find that it is essential to use a decoding scheme that produces diverse image captions for training. Results show that the predicted translations capture the main semantics of the spoken audio, albeit in a simpler and shorter form.

Files

oneata24_interspeech.pdf

Files (662.0 kB)

Name	Size	Download all
oneata24_interspeech.pdf md5:47b9327b7b566d6a497d12758679607f	662.0 kB	Preview Download

Additional details

European Commission
AI4TRUST - AI-based-technologies for trustworthy solutions against disinformation 101070190

Views

Downloads

Show more details

	All versions	This version
Views	37	37
Downloads	36	36
Data volume	26.5 MB	26.5 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Zenodo

Conference

Interspeech , Kos, Greece, 1-5 September 2024

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 4, 2024
Modified: November 4, 2024

Translating speech with just images

Creators

Description

Files

oneata24_interspeech.pdf

Files (662.0 kB)

Additional details

Funding