Lost in Translation? Not for Large Language Models: Automated Divergent Thinking Scoring Performance Translates to Non-English Contexts

Zielińska, Aleksandra; Organisciak, Peter; Dumas, Denis; Karwowski, Maciej

doi:10.1016/j.tsc.2023.101414

Published October 31, 2023 | Version v1

Journal article Open

Lost in Translation? Not for Large Language Models: Automated Divergent Thinking Scoring Performance Translates to Non-English Contexts

1. University of Wrocław
2. University of Denver
3. University of Georgia

Contributors

Researcher (4):

1. University of Wrocław
2. University of Denver
3. University of Georgia

Divergent thinking (DT) has been at the heart of creativity measurement for over seven decades. At the same time, large-scale usage of DT tests is limited due to the tedious procedure of scoring the responses, which often requires several judges to assess thousands of participants’ ideas. Across two studies (N = 195 and N = 404), we examined the quality of artificial intelligence-based scoring models (Ocsai, Organisciak et al., 2023) to score Alternate Uses Tasks (Study 1: brick, Study 2: brick, can, rope). Based on more than 6000 ideas provided by participants in Polish and automatically translated to English, we fit a series of idea (response)- and prompt (object)-level structural equation models. When artificial intelligence-based and semantic distance scores were modeled together, latent correlations with human ratings ranged from r = .56 to r = .95 at the response (idea) level and from r = .61 to r = .99 at the object (prompt) level. A hierarchical (i.e., person-level) model with three DT tasks modeled together (Study 2) demonstrated a latent correlation between automatized and human ratings of r = .96 (Babbage) and r = .98 (DaVinci). Notably, the same results were obtained based on untranslated responses provided in Polish. Automated and human scores provided the same serial-order effect pattern and the same profile of differences under “be fluent” vs. “be creative” instructions. This investigation offers an initial yet compelling argument that the new algorithms provide a close-to-perfect score of DT tasks when benchmarked against human ratings, even when the responses are created in a different language and automatically translated to English or used in an untranslated form.

Files

Lost_in_Translation_MANUSCRIPT.pdf

Files (1.5 MB)

Name	Size	Download all
Lost_in_Translation_MANUSCRIPT.pdf md5:283aa840a678fcc9d951fa45d3593034	1.5 MB	Preview Download

Additional details

Has metadata: Dataset: https://zenodo.org/records/12574141 (URL); Computational notebook: https://zenodo.org/records/12635428 (URL)

National Science Centre
Grant 2022/45/B/HS6/00372
National Science Centre
Grant 2022/45/N/HS6/00625

Available: 2023-10-31

Repository URL: https://osf.io/ftnvg
Programming language: R

	All versions	This version
Views	87	87
Downloads	71	71
Data volume	152.7 MB	152.7 MB

Contributors

Researcher (4):

Lost_in_Translation_MANUSCRIPT.pdf

Files (1.5 MB)

Related works

Funding

Dates

Software

Lost in Translation? Not for Large Language Models: Automated Divergent Thinking Scoring Performance Translates to Non-English Contexts

Authors/Creators

Contributors

Researcher (4):

Description

Files

Lost_in_Translation_MANUSCRIPT.pdf

Files (1.5 MB)

Additional details

Related works

Funding

Dates

Software