Published March 24, 2026 | Version v1
Dataset Open

MARTTS: Maritime Radio Text-To-Speech Synthetic Corpus

  • 1. ROR icon Deutsches Zentrum für Luft- und Raumfahrt e. V. (DLR)

Description

MARTTS: Maritime Radio Text-To-Speech Synthetic Corpus. 
A Text-to-Speech Framework for Generating Synthetic Maritime Radio Communications in ASR Evaluation. 
MARTTS is an open-source synthetic speech corpus designed to evaluate and stress-test Automatic Speech Recognition (ASR) systems operating in maritime VHF radiotelephony environments.


The dataset contains 240 realistic multi-speaker distress, urgency, SAR, and routine maritime dialogues, generated through:


- SMCP-compliant templates  
- LLM-based scenario generation  
- AIS-derived ship names, MMSI identifiers, and positions  
- Synthesis with the Chatterbox TTS model  
- A multi-stage radio post-processing pipeline


The dataset emulates true operational VHF conditions, including channel artifacts, background noise, environmental ship noise, dropouts, squelch clicks, and band-limiting.
It is intended for stress-testing and validating ASR systems under realistic maritime conditions where authentic data is scarce or sensitive.
This dataset is, to our knowledge, the first publicly available synthetic dataset tailored to maritime distress communication. 

Files

dataset_clean.zip

Files (1.0 GB)

Name Size Download all
md5:8a6db912c7251cfc530ad5e986364742
502.5 MB Preview Download
md5:d566630b86b7fa86c715a82da58b597a
511.0 MB Preview Download
md5:d1ae43386e8e9f5442a32f5fd21082c5
73.3 kB Download
md5:49f34154ff25891f8543954834f9e304
220.3 kB Download
md5:3c046aaffd5dc3a675a616bf521bd4c6
185.3 kB Preview Download
md5:c3f56b5d4955a52aa5ad1e438a60995e
248.7 kB Download
md5:9811d781a3bba0103e3540c393a75663
3.7 kB Preview Download
md5:3dffeffb23b2c5b3f336ea287f7fc9ad
273.6 kB Download
md5:95e71acc1eb691848a25da6d8fd709e2
283.7 kB Download
md5:68db01c1cde1c912ac2b9cfe7078346a
180.3 kB Download