Thorsten-Voice Dataset 2021.02

Müller, Thorsten; Kreutz, Dominik

doi:10.5281/zenodo.5525342

Published February 10, 2021 | Version 3.0

Dataset Open

Thorsten-Voice Dataset 2021.02

Thorsten-Voice (Thorsten-21.02-neutral) is a neutrally spoken voice dataset recorded by Thorsten Müller, audio optimized by Dominik Kreutz and licenced under CC0 to provide it for anybody without any financial or licence struggle.

"I contribute my personal voice as a person believing in a world where all people are equal. No matter of gender, sexual orientation, religion, skin color and geocoordinates of birth location. A global world where everybody is warmly welcome on any place on this planet and open and free knowledge and education is available to everyone." (Thorsten Müller)

Dataset details:

ljspeech file and directory structure
22.668 recorded phrases (wav files)
more than 23 hours of pure audio
samplerate 22.050Hz
mono
normalized to -24dB
phrase length (min/avg/max): 2 / 52 / 180 chars
no silence at beginning/ending
avg spoken chars per second: 14
sentences with question mark: 2.780
sentences with exclamation mark: 1.840

See more details on my Github page or Thorsten-Voice project website.

Notes

Please use it to make the world a better place for whole humankind.

Files

Files (2.7 GB)

Name	Size	Download all
thorsten-neutral_v03.tgz md5:a6300c3dd07cde05e8e154f1d1f19de6	2.7 GB	Download

	All versions	This version
Views	5,772	5,748
Downloads	7,410	7,406
Data volume	38.0 TB	38.0 TB

Thorsten-Voice Dataset 2021.02

Authors/Creators

Description

Notes

Files

Files (2.7 GB)