Dataset Open Access
The DAPS (Device and Produced Speech) dataset is a collection of aligned versions of professionally produced studio speech recordings and recordings of the same speech on common consumer devices (tablet and smartphone) in real-world environments. It has 15 versions of audio (3 professional versions and 12 consumer device/real-world environment combinations). Each version consists of about 4 1/2 hours of data (about 14 minutes from each of 20 speakers). Please see this paper for a detailed description of the dataset:
Gautham J. Mysore, “Can We Automatically Transform Speech Recorded on Common Consumer Devices in Real-World Environments into Professional Production Quality Speech? - A Dataset, Insights, and Challenges”, in the IEEE Signal Processing Letters, Vol. 22, No. 8, August 2015
The primary goal of the dataset is to help develop methods to automatically convert real-world device recordings into professional sounding recordings. It can be also used for various other applications like voice conversion, traditional speech enhancement, and automatic production of studio recordings.