Dataset for Automated Medical Transcription

Kazi, Nazmul; Kuntz, Matt; Kanewala, Upulee; Kahanda, Indika

doi:10.5281/zenodo.4279041

Published November 18, 2020 | Version v1.0

Dataset Open

Dataset for Automated Medical Transcription

1. Montana State University
2. National Alliance on Mental Illness - Montana
3. University of North Florida

Contributors

Others:

1. Frontier Psychiatry

We generated this dataset to train a machine learning model for automatically generating psychiatric case notes from doctor-patient conversations. Since, we didn't have access to real doctor-patient conversations, we used transcripts from two different sources to generate audio recordings of enacted conversations between a doctor and a patient. We employed eight students who worked in pairs to generate these recordings. Six of the transcripts that we used to produce this recordings were hand-written by Cheryl Bristow and rest of the transcripts were adapted from Alexander Street which were generated from real doctor-patient conversations. Our study requires recording the doctor and the patient(s) in seperate channels which is the primary reason behind generating our own audio recordings of the conversations.

We used Google Cloud Speech-To-Text API to transcribe the enacted recordings. These newly generated transcripts are auto-generated entirely using AI powered automatic speech recognition whereas the source transcripts are either hand-written or fine-tuned by human transcribers (transcripts from Alexander Street).

We provided the generated transcripts back to the students and asked them to write case notes. The students worked independently using a software that we developed earlier for this purpose. The students had past experience of writing case notes and we let the students write case notes as they practiced without any training or instructions from us.

NOTE: Audio recordings are not included in Zenodo due to large file size but they are available in the GitHub repository.

Files

nazmulkazi/dataset_automated_medical_transcription-v1.0.zip

Files (6.4 MB)

Name	Size	Download all
nazmulkazi/dataset_automated_medical_transcription-v1.0.zip md5:7b9f5699bce1aab528c36c46a8269565	6.4 MB	Preview Download

Additional details

Is supplement to: https://github.com/nazmulkazi/dataset_automated_medical_transcription/tree/v1.0 (URL)

	All versions	This version
Views	6,995	6,905
Downloads	1,264	1,245
Data volume	8.8 GB	8.7 GB

Dataset for Automated Medical Transcription

Creators

Contributors

Others:

Description

Files

nazmulkazi/dataset_automated_medical_transcription-v1.0.zip

Files (6.4 MB)

Additional details

Related works