Presentation Open Access

SSHOC Workshop: Sharing Datasets of Pathological Speech

Henk van den Heuvel; Nicola Bessell; Paul Trilsbeek; Libby Bishop; Katarzyna Klessa

Corpora and datasets of pathological speech are hard to get simply because they are hard to share. In this webinar we will present and explore several alternatives for sharing such sensitive data. The webinar is interesting for all who struggle with sharing and obtaining similar types of data.    

 

Topics for discussion include:

  • Progress achieved by the DELAD initiative for sharing corpora of speech disorders (CSD) and the role of the CLARIN Knowledge Centre on Atypical Communication Expertise 
  • GDPR and the ethics of special category data relevant for collecting and sharing CSD
  • How storing and sharing CSD is arranged in a GDPR compliant way at the Language Archive of the Max Plank Institute for Psycholinguistics and the collaboration with the Talkbank at CMU
  • Infrastructure requirements for secure remote access to sensitive research data with diverse legal (e.g. social media terms of service), ethical (e.g. children as subjects), and technical (e.g. audio and video) challenges, and assessment of several existing platforms
  • The CAVA audio-visual human communication archive project - a digital video repository to support the work of the international human communication research community. This resource enhances the discoverability and re-usability of expensively-created, specialist video content
  • The curation and disclosure of pathological speech corpora: how CSD can be found through one organisation and made accessible through another - includes a demonstration using the example of the Polish Cued Speech Corpus of Hearing-Impaired Children

Files (29.1 MB)
Name Size
SSHOC2020-webinar-DELAD.pdf
md5:609732767e8ede8a3ec94c253fe5b436
29.1 MB Download
99
88
views
downloads
All versions This version
Views 9991
Downloads 8887
Data volume 2.5 GB2.5 GB
Unique views 8985
Unique downloads 7776

Share

Cite as