Can a conversational agent pass theory-of-mind tasks? A case study of ChatGPT with the Hinting, False Beliefs, and Strange Stories paradigms.

Eric Brunet-Gouet; Nathan Vidal; Paul Roux

doi:10.5281/zenodo.8009748

Published June 21, 2023 | Version version 2

Preprint Open

Can a conversational agent pass theory-of-mind tasks? A case study of ChatGPT with the Hinting, False Beliefs, and Strange Stories paradigms.

1. Centre Hospitalier de Versailles, Service Hospitalo-Universitaire de Psychiatrie d'Adultes et d'Addictologie, Le Chesnay, France ; Université Paris-Saclay, Université Versailles Saint-Quentin-En-Yvelines, DisAP-DevPsy-CESP, INSERM UMR1018, 94807 Villejuif, France
2. Université Paris-Saclay, Université Versailles Saint-Quentin-En-Yvelines, DisAP-DevPsy-CESP, INSERM UMR1018, 94807 Villejuif, France

We investigate the possibility that the recently proposed OpenAI’s ChatGPT conversational agent could be examined with classical theory-of-mind paradigms. We used an indirect speech understanding task, the hinting task, a new text version of a False Belief/False Photographs paradigm, and the Strange Stories paradigm. The hinting task is usually used to assess individuals with autism or schizophrenia by requesting them to infer hidden intentions from short conversations involving two characters. In a first experiment, ChatGPT 3.5 exhibits quite limited performances on the Hinting task when either original scoring or revised rating scales are used. We introduced slightly modified versions of the hinting task in which either cues about the presence of a communicative intention were added or a specific question about the character’s intentions were asked. Only the latter demonstrated enhanced performances. No dissociation between the conditions was found. The Strange Stories were associated with correct performances but we could not be sure that the algorithm had no prior knowledge of the test. In the second experiment, the most recent version of ChatGPT (4-0314) exhibited better performances in the Hinting task, although they did not match the average scores of healthy subjects. In addition, the model could solve first and second order False Beliefs tests but failed on items with reference to a physical property like object visibility or more complex inferences. This work offers an illustration of the possible application of psychological constructs and paradigms to a conversational agent of a radically new nature.

Notes

Updated version of the study (first version: Feb 13, 2023, DOI : 10.5281/zenodo.7637476) with a second experiment.

Peer-reviewed and published in : Brunet-Gouet, E., Vidal, N., Roux, P. (2024). Can a Conversational Agent Pass Theory-of-Mind Tasks? A Case Study of ChatGPT with the Hinting, False Beliefs, and Strange Stories Paradigms. In: Baratgin, J., Jacquet, B., Yama, H. (eds) Human and Artificial Rationalities. HAR 2023. Lecture Notes in Computer Science, vol 14522. Springer, Cham. https://doi.org/10.1007/978-3-031-55245-8_7

Files

ChatGPT and ToM Zenodo 21june2023.pdf

Files (737.3 kB)

Name	Size	Download all
ChatGPT and ToM Zenodo 21june2023.pdf md5:91e85adbcd0519b53317c2eaa65036b3	512.1 kB	Preview Download
Exp md5:654d6408a5d44cbfb73eefda1742f1d9	72.9 kB	Download
Exp 1 ChatGPT ToM tests.xlsx md5:654d6408a5d44cbfb73eefda1742f1d9	72.9 kB	Download
Exp 2 ChatGPT ToM tests for API.xlsx md5:185c2d1eecd41d64339e7923ef6cf1fc	59.8 kB	Download
Supplementary material.docx md5:e90da37f4fc227487230b0718c7b101e	19.5 kB	Download

Additional details

DOI: 10.1007/978-3-031-55245-8_7

	All versions	This version
Views	968	310
Downloads	757	362
Data volume	303.5 MB	129.0 MB

Can a conversational agent pass theory-of-mind tasks? A case study of ChatGPT with the Hinting, False Beliefs, and Strange Stories paradigms.

Authors/Creators

Description

Notes

Files

ChatGPT and ToM Zenodo 21june2023.pdf

Files (737.3 kB)

Additional details

Identifiers