Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information

Zhan, Xiao; Carrillo, Juan Carlos; Seymour, William; Such, Jose

doi:10.5281/zenodo.15610905

Published June 6, 2025 | Version v1

Software Open

Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information

1. King's College London
2. Universitat Politècnica de València

Abstract:

LLM-based Conversational AIs (CAIs) are increasingly used across various domains, but they pose privacy risks, as research has shown that users often disclose personal information during their conversations with the CAIs. Recent attention has focused on malicious applications of LLMs, which have demonstrated capabilities suggesting that LLMs could be repurposed to perform harmful tasks. However, a novel and particularly concerning type of malicious LLM application, i.e., an LLM-based CAI that is deliberately designed to extract personal information from users, remains largely unexplored.

To address this gap, we created LLM-based CAIs based on system prompts that used different strategies to encourage disclosures of personal information from users to systematically investigate their ability to extract personal information from users during conversations. By conducting a randomized-controlled trial with 502 participants, we analyzed the personal information disclosed during their dialogues with different malicious and benign CAIs to assess their effectiveness in extracting personal information and participants’ perceptions collected after their interactions with the CAIs. Our findings reveal that malicious CAIs extract significantly and substantially more personal information than benign CAIs, with strategies based on the social nature of privacy being the most effective. This study underscores the privacy threats posed by this novel type of malicious LLM-based CAIs and provides actionable recommendations to guide future research and practice.

Useful scripts:

To run the CAIs, please first download the LLMs locally onto your device and replace the Cache directory with your own to save and load the LLMs. An example link for downloading an LLM is Llama3-8B-Instruct on Huggingface.

Example: Prepare the Llama3-8B D-CAI
In the script ./Llama3-8B/direct_8b.py, locate and update the following lines. Replace "My cache dir ..." with your own local path. This ensures the model is saved to that directory and loaded directly from it in future sessions:

orig_model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
    cache_dir="My cache dir ...",
)

Running ./run.sh will open an interface where you can select the LLM architecture and the desired prompt strategy. The generated dialogues will be saved in a folder corresponding to each LLM in .json format.

The pre-evaluation scripts, which include sentence similarity calculation and use GPT-4o as a judge for computing contextual and emotional similarity, are located in pre_evaluation/pre_eval.py.

Scripts for running NuExtract to identify the categories of personal information in participant dialogues with the developed CAIs are located in NuExtract/NuExtract.ipynb.

Note:

The supplementary materials, including survey content and codebook, can be found in the folder names Supplement Material.

Files

Code.zip

Files (855.1 kB)

Name	Size	Download all
Code.zip md5:472762ecebf46fa603b194430e50290b	195.6 kB	Preview Download
Supplementary Material.zip md5:30f62145b3a91c7ebfa6b723e4fe24b3	659.5 kB	Preview Download

Additional details

Programming language: Python

	All versions	This version
Views	433	433
Downloads	149	149
Data volume	61.3 MB	61.3 MB

Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information

Authors/Creators

Description

Abstract:

Useful scripts:

Note:

Files

Code.zip

Files (855.1 kB)

Additional details

Software