Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization

Labrak, Yanis; Grünert, David; BAROUDI, Séverin; Chun, Jiyun; Cyrta, Pawel; Burdisso, Sergio Gastón; Hassoon, Ahmed; Liu, David; Rothschild, Adam; Van Deusen, Reed; Motlicek, Petr; Perrault, Andrew; Marxer, Ricard; Schaaf, Thomas

doi:10.5281/zenodo.19458085

Published April 7, 2026 | Version v1

Preprint Open

Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization

1. Idiap Research Institute
2. University of Zurich
3. Laboratoire d'Informatique et Systèmes
4. Ohio State University
5. Metamedia Technologies
6. Johns Hopkins University
7. Colorado School of Mines
8. solventum
9. University of Pittsburgh
10. Brno University of Technology
11. The Ohio State University
12. Pompeu Fabra University
13. Université de Toulon

Long-context audio reasoning is underserved in both training data and evaluation. Existing benchmarks target short-context tasks, and the open-ended generation tasks most relevant to long-context reasoning pose well-known challenges for automatic evaluation. We propose a synthetic data generation pipeline designed to serve both as a training resource and as a controlled evaluation environment, and instantiate it for first-visit doctor-patient conversations with SOAP note generation as the task. The pipeline has three stages, persona-driven dialogue generation, multi-speaker audio synthesis with overlap/pause modeling, room acoustics, and sound events, and LLM-based reference SOAP note production, built entirely on open-weight models. We release 8,800 synthetic conversations with 1.3k hours of corresponding audio and reference notes. Evaluating current open-weight systems, we find that cascaded approaches still substantially outperform end-to-end models.

Files

Interspeech_2026__Generating_Synthetic_Doctor_Patient_Conversations_for_Long_form_Audio_Summarization-5.pdf

Files (240.2 kB)

Name	Size	Download all
Interspeech_2026__Generating_Synthetic_Doctor_Patient_Conversations_for_Long_form_Audio_Summarization-5.pdf md5:1074c67e30479795fc88f9e8e08bbe3f	240.2 kB	Preview Download

Additional details

Submitted: 2026-03-04

Submitted for review at Interspeech 2026

	All versions	This version
Views	50	50
Downloads	13	13
Data volume	4.1 MB	4.1 MB

Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization

Authors/Creators

Description

Files

Interspeech_2026__Generating_Synthetic_Doctor_Patient_Conversations_for_Long_form_Audio_Summarization-5.pdf

Files (240.2 kB)

Additional details

Dates