Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization
Authors/Creators
-
Labrak, Yanis
(Researcher)1
-
Grünert, David
(Researcher)2
-
BAROUDI, Séverin
(Researcher)3
-
Chun, Jiyun
(Researcher)4
-
Cyrta, Pawel5
-
Burdisso, Sergio Gastón
(Researcher)1
-
Hassoon, Ahmed
(Researcher)6
-
Liu, David
(Researcher)7
- Rothschild, Adam (Researcher)8
-
Van Deusen, Reed
(Researcher)9
-
Motlicek, Petr
(Researcher)1, 10
-
Perrault, Andrew
(Researcher)11
-
Marxer, Ricard
(Researcher)12, 13
-
Schaaf, Thomas
(Researcher)8
-
1.
Idiap Research Institute
-
2.
University of Zurich
-
3.
Laboratoire d'Informatique et Systèmes
-
4.
Ohio State University
- 5. Metamedia Technologies
-
6.
Johns Hopkins University
-
7.
Colorado School of Mines
- 8. solventum
-
9.
University of Pittsburgh
-
10.
Brno University of Technology
-
11.
The Ohio State University
-
12.
Pompeu Fabra University
-
13.
Université de Toulon
Description
Long-context audio reasoning is underserved in both training data and evaluation. Existing benchmarks target short-context tasks, and the open-ended generation tasks most relevant to long-context reasoning pose well-known challenges for automatic evaluation. We propose a synthetic data generation pipeline designed to serve both as a training resource and as a controlled evaluation environment, and instantiate it for first-visit doctor-patient conversations with SOAP note generation as the task. The pipeline has three stages, persona-driven dialogue generation, multi-speaker audio synthesis with overlap/pause modeling, room acoustics, and sound events, and LLM-based reference SOAP note production, built entirely on open-weight models. We release 8,800 synthetic conversations with 1.3k hours of corresponding audio and reference notes. Evaluating current open-weight systems, we find that cascaded approaches still substantially outperform end-to-end models.
Files
Interspeech_2026__Generating_Synthetic_Doctor_Patient_Conversations_for_Long_form_Audio_Summarization-5.pdf
Files
(240.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:1074c67e30479795fc88f9e8e08bbe3f
|
240.2 kB | Preview Download |
Additional details
Dates
- Submitted
-
2026-03-04Submitted for review at Interspeech 2026