Published September 13, 2025 | Version v1
Journal article Open

The impact of fine-tuning LLMs on the quality of automated therapy assessed by digital patients

  • 1. Efi Arazi School of Computer Science, Reichman University, Herzliya, Israel
  • 2. Baruch Ivcher School of Psychology, Reichman University, Herzliya, Israel
  • 3. Sammy Ofer School of Communications, Reichman University, Herzliya, Israel

Description

The use of generative large language models (LLMs) in mental health applications is gaining traction, with some proposals even suggesting LLM-based automated therapists. In this study, we assess the impact of fine-tuning therapist LLMs to improve the quality of therapy sessions, addressing a critical question in LLM-based mental health research. Specifically, we demonstrate that fine-tuning with datasets focused on specific therapeutic techniques significantly enhances the performance of LLM therapists. To facilitate this assessment, we introduce a novel evaluation system based on digital patients, powered by LLMs, which engage in text-based therapy sessions and provide session evaluations through questionnaires designed for human patients.

This method addresses the inadequacies of traditional text-similarity metrics, which are insufficient for assessing the quality of therapeutic interactions. This study centers on motivational interviewing (MI), a structured and goal-oriented therapeutic approach. However, our digital therapists and patients can be adapted to work in other forms of therapy. We believe that our digital therapists offer a standardized method for assessing automated therapists and showcasing the potential of LLMs in mental health care.

Files

s44184-025-00159-1.pdf

Files (2.4 MB)

Name Size Download all
md5:40e499f44290544373633b8c0be84b43
2.4 MB Preview Download

Additional details

Funding

European Commission
GuestXR - GuestXR: A Machine Learning Agent for Social Harmony in eXtended Reality 101017884