A Comprehensive Evaluation of Fine-Tuned LLMs as AI-Generated Text Detectors under Adversarial and Multi-Domain Conditions

Murgia, Marco; Reforgiato Recupero, Diego; Spathoulas, Georgios

doi:10.1109/ACCESS.2026.3689209

Published April 30, 2026 | Version v1

Journal article Open

A Comprehensive Evaluation of Fine-Tuned LLMs as AI-Generated Text Detectors under Adversarial and Multi-Domain Conditions

1. University of Cagliari
2. University of Thessaly

The emergence of Large Language Models (LLMs) generating fluent, human-like text increases risks like disinformation, requiring robust detection systems effective across domains and against adversarial attacks such as paraphrasing. We leveraged fine-tuned LLMs for AI-generated text detection, evaluating three strategies: supervised fine-tuning (SFT), SFT combined with Direct Preference Optimization (SFT+DPO), and multi-stage fine-tuning (SFT+SFT). We assess these in single and multi-domain contexts, examining domain impact and robustness against adversarial manipulations. Results indicate that this approach consistently matches or outperforms current methods in the literature. Across four evaluation settings that vary domain overlap and generator overlap, our best detector maintains high F1 under ten adversarial transformations (formatting, spelling perturbations, and semantic rewrites). In the most challenging condition, texts from domains unseen during fine-tuning and generated by unseen models, the top multi-stage configuration achieves an average F1 of 95.25%, outperforming strong zero-shot baselines (Binoculars, RADAR, GLTR) and a supervised RAID leaderboard baseline (e5-small) under the same evaluation protocol. These results highlight the practical robustness and generalization properties of fine-tuned LLM-based detectors rather than proposing a new detection paradigm.

Files

A_Comprehensive_Evaluation_of_Fine-Tuned_LLMs_as_AI-Generated_Text_Detectors_under_Adversarial_and_Multi-Domain_Conditions.pdf

Files (5.7 MB)

Name	Size	Download all
A_Comprehensive_Evaluation_of_Fine-Tuned_LLMs_as_AI-Generated_Text_Detectors_under_Adversarial_and_Multi-Domain_Conditions.pdf md5:8a1dc1fd207337499d1826a5d9b72ca8	5.7 MB	Preview Download

	All versions	This version
Views	5	5
Downloads	22	22
Data volume	125.6 MB	125.6 MB

A Comprehensive Evaluation of Fine-Tuned LLMs as AI-Generated Text Detectors under Adversarial and Multi-Domain Conditions

Authors/Creators

Description

Files

A_Comprehensive_Evaluation_of_Fine-Tuned_LLMs_as_AI-Generated_Text_Detectors_under_Adversarial_and_Multi-Domain_Conditions.pdf

Files (5.7 MB)