NeuroVoz: a Castillian Spanish corpus of parkinsonian speech

Mendes-Laureano, Janaína; Gómez-García, Jorge Andrés; Guerrero-López, Alejandro; Luque-Buzo, Elisa; Arias-Londoño, Julián D.; Grandas-Pérez, Francisco J.; Godino Llorente, Juan Ignacio

doi:10.5281/zenodo.13647600

Published September 3, 2024 | Version v3

Dataset Restricted

NeuroVoz: a Castillian Spanish corpus of parkinsonian speech

1. Universidad Politécnica de Madrid
2. Hospital General Universitario Gregorio Marañón

The NeuroVoz dataset emerges as a pioneering resource in the field of computational linguistics and biomedical research, specifically designed to enhance the diagnosis and understanding of Parkinson's Disease (PD) through speech analysis. This dataset is distinguished as the first of its kind to be made publicly available in Castilian Spanish, addressing a critical gap in the availability of linguistic and dialectical diversity within PD research.

Compiled from a cohort of 112 participants, including 54 individuals diagnosed with PD and 58 healthy controls, the NeuroVoz dataset offers a rich compilation of speech recordings. All PD participants were recorded under medication (ON state), ensuring consistency and reliability in the speech samples collected. The dataset is meticulously curated to include a variety of speech tasks—ranging from sustained vowel phonations and diadochokinetic (DDK) tests to 16 structured listen-and-repeat utterances and spontaneous monologues. The inclusion of both manually transcribed listen-and-repeat tasks and Whisper-automated transcriptions for monologues underscores our commitment to data accuracy and usability.

Encompassing 2,977 audio files, the NeuroVoz dataset provides an extensive repository, averaging 26.88 +- 3.35 recordings per participant, making it an invaluable asset for researchers seeking to explore the nuances of PD-affected speech. The dataset's structure and composition facilitate a multifaceted analysis of speech impairments associated with PD, offering insights into phonatory, articulatory, and prosodic changes.

In contributing to the body of knowledge with the NeuroVoz dataset, we invite the scientific community to engage with this dataset, explore the specific speech characteristics of PD in Castilian Spanish speakers, and advance the field of PD diagnosis through innovative speech analysis techniques.

If you use this dataset, please cite both this Zenodo and the article describing the corpus:

Mendes-Laureano, J., Gómez-García, J.A., Guerrero-López, A. et al. NeuroVoz: a Castillian Spanish corpus of parkinsonian speech. Sci Data 11, 1367 (2024). https://doi.org/10.1038/s41597-024-04186-z
Zenodo dataset: Mendes-Laureano, J., Gómez-García, J. A., Guerrero-López, A., Luque-Buzo, E., Arias-Londoño, J. D., Grandas-Pérez, F. J., & Godino Llorente, J. I. (2024). NeuroVoz: a Castillian Spanish corpus of parkinsonian speech (1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10777657

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/13647600">Log in</a> to check if you have access.

Additional details

URL: https://arxiv.org/abs/2403.02371

Is cited by: Conference paper: 10.1109/EMBC.2018.8512562 (DOI); Journal article: 10.1016/j.bspc.2018.10.020 (DOI); Conference paper: https://link.springer.com/chapter/10.1007/978-3-030-65654-6_5 (URL); Journal article: 10.3390/bioengineering10111316 (DOI); Conference paper: https://link.springer.com/chapter/10.1007/978-3-030-65654-6_6 (URL)

Agencia Estatal de Investigación
Ministry of Economy and Competitiveness of Spain PID2021-128469OB-I00
Agencia Estatal de Investigación
Ministry of Economy and Competitiveness of Spain TED2021-131688B-I00
Universidad Politécnica de Madrid
Maria Zambrano 2021 Maria Zambrano 2021
Agencia Estatal de Investigación
Ministry of Economy and Competitiveness of Spain DPI2017-83405-R1

Valid: 2024-09-03

Peer-reviewed on first row

Repository URL: https://github.com/BYO-UPM/Neurovoz_Dababase
Programming language: Python
Development Status: Active

	All versions	This version
Views	6,119	2,383
Downloads	387	179
Data volume	557.3 GB	246.1 GB

Files

Restricted

Identifiers

Related works

Funding

Dates

Software

NeuroVoz: a Castillian Spanish corpus of parkinsonian speech

Authors/Creators

Description

Files

Restricted

Additional details

Identifiers

Related works

Funding

Dates

Software