Published March 29, 2020 | Version 0.1.0
Dataset Open

Oral cancer speech corpus for paper "Detecting and analysing spontaneous oral cancer speech in the wild"

  • 1. Netherlands Cancer Institute
  • 2. TU Delft

Description

This is the oral cancer speech corpus used in the paper "Detecting and analysing spontaneous oral cancer speech in the wild".

Description

This dataset contains approximately 3 hours of oral cancer speech data collected from YouTube, including a file with additional metadata. We use this dataset to perform an oral cancer speech detection task in our paper.

Funding

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under Marie Sklodowska-Curie grant agreement No 766287. The Department of Head and Neck Oncology and surgery of the Netherlands Cancer Institute receives a research grant from Atos Medical (Horby, Sweden),
which contributes to the existing infrastructure for quality of life research.

Citation:

If you use this dataset please cite:

@misc{halpern2020detecting,
    title={Detecting and analysing spontaneous oral cancer speech in the wild},
    author={Bence Mark Halpern and Rob van Son and Michiel van den Brekel and Odette Scharenborg},
    year={2020},
    eprint={2007.14205},
    archivePrefix={arXiv},
    primaryClass={eess.AS}
}

 

Files

oral_cancer.zip

Files (5.8 GB)

Name Size Download all
md5:31f9d02a9d500adf4e33aba3056044e6
5.8 GB Preview Download

Additional details

Funding

European Commission
TAPAS - Training Network on Automatic Processing of PAthological Speech 766287