Dataset for Automated Medical Transcription
- 1. Montana State University
- 2. National Alliance on Mental Illness - Montana
- 3. University of North Florida
Description
We generated this dataset to train a machine learning model for automatically generating psychiatric case notes from doctor-patient conversations. Since, we didn't have access to real doctor-patient conversations, we used transcripts from two different sources to generate audio recordings of enacted conversations between a doctor and a patient. We employed eight students who worked in pairs to generate these recordings. Six of the transcripts that we used to produce this recordings were hand-written by Cheryl Bristow and rest of the transcripts were adapted from Alexander Street which were generated from real doctor-patient conversations. Our study requires recording the doctor and the patient(s) in seperate channels which is the primary reason behind generating our own audio recordings of the conversations.
We used Google Cloud Speech-To-Text API to transcribe the enacted recordings. These newly generated transcripts are auto-generated entirely using AI powered automatic speech recognition whereas the source transcripts are either hand-written or fine-tuned by human transcribers (transcripts from Alexander Street).
We provided the generated transcripts back to the students and asked them to write case notes. The students worked independently using a software that we developed earlier for this purpose. The students had past experience of writing case notes and we let the students write case notes as they practiced without any training or instructions from us.
NOTE: Audio recordings are not included in Zenodo due to large file size but they are available in the GitHub repository.
Files
nazmulkazi/dataset_automated_medical_transcription-v1.0.zip
Files
(6.4 MB)
Name | Size | Download all |
---|---|---|
md5:7b9f5699bce1aab528c36c46a8269565
|
6.4 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/nazmulkazi/dataset_automated_medical_transcription/tree/v1.0 (URL)