Published October 2024 | Version v1
Conference paper Open

Using Combinatorial Testing for Prompt Engineering of LLMs in Medicine

  • 1. ROR icon Graz University of Technology
  • 2. Technische Universität Graz

Description

Large Language Models (LLMs) like GPT-4o are of growing interest. Interfaces such as ChatGPT invite an ever-growing number of people to ask questions, including health advice, which brings in additional risks for harm. It is well known that tools based on LLMs tend to hallucinate or deliver different answers for the same or similar questions. In both cases, the outcome might be wrong or incomplete, possibly leading to safety issues. In this paper, we investigate the outcome of ChatGPT when we ask similar questions in the medical domain. In particular, we suggest using combinatorial testing to generate variants of questions aimed at identifying wrong or misleading answers. In detail, we discuss the general framework and its parts and present a proof-of-concept utilizing a medical query and ChatGPT.

Files

perko2024b.pdf

Files (670.0 kB)

Name Size Download all
md5:4ba189ae5dfd59506042443064ebd663
670.0 kB Preview Download

Additional details

Funding

European Commission
ChatMED - Bridging Research Institutions to Catalyze Generative AI Adoption by the Health Sector in the Widening Countries 101159214