MARTTS: Maritime Radio Text-To-Speech Synthetic Corpus
Authors/Creators
Description
MARTTS: Maritime Radio Text-To-Speech Synthetic Corpus.
A Text-to-Speech Framework for Generating Synthetic Maritime Radio Communications in ASR Evaluation.
MARTTS is an open-source synthetic speech corpus designed to evaluate and stress-test Automatic Speech Recognition (ASR) systems operating in maritime VHF radiotelephony environments.
The dataset contains 240 realistic multi-speaker distress, urgency, SAR, and routine maritime dialogues, generated through:
- SMCP-compliant templates
- LLM-based scenario generation
- AIS-derived ship names, MMSI identifiers, and positions
- Synthesis with the Chatterbox TTS model
- A multi-stage radio post-processing pipeline
The dataset emulates true operational VHF conditions, including channel artifacts, background noise, environmental ship noise, dropouts, squelch clicks, and band-limiting.
It is intended for stress-testing and validating ASR systems under realistic maritime conditions where authentic data is scarce or sensitive.
This dataset is, to our knowledge, the first publicly available synthetic dataset tailored to maritime distress communication.
Files
dataset_clean.zip
Files
(1.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:8a6db912c7251cfc530ad5e986364742
|
502.5 MB | Preview Download |
|
md5:d566630b86b7fa86c715a82da58b597a
|
511.0 MB | Preview Download |
|
md5:d1ae43386e8e9f5442a32f5fd21082c5
|
73.3 kB | Download |
|
md5:49f34154ff25891f8543954834f9e304
|
220.3 kB | Download |
|
md5:3c046aaffd5dc3a675a616bf521bd4c6
|
185.3 kB | Preview Download |
|
md5:c3f56b5d4955a52aa5ad1e438a60995e
|
248.7 kB | Download |
|
md5:9811d781a3bba0103e3540c393a75663
|
3.7 kB | Preview Download |
|
md5:3dffeffb23b2c5b3f336ea287f7fc9ad
|
273.6 kB | Download |
|
md5:95e71acc1eb691848a25da6d8fd709e2
|
283.7 kB | Download |
|
md5:68db01c1cde1c912ac2b9cfe7078346a
|
180.3 kB | Download |