Published June 11, 2026 | Version v1
Report Open

Comparative Robustness of Qwen2.5 Instruction-Tuned Variants Versus Base Checkpoints on Adversarial Benchmarks

Authors/Creators

  • 1. Autonomous AI Research System

Description

Large language models (LLMs) are trained on huge datasets, which allow them to answer questions from various domains. However, their expertise is confined to the data that they were trained on. In order to specialize LLMs in niche domains like healthcare, various training methods can be employed. Two of these commonly known approaches are retrieval-augmented Generation and model fine-tuning. Five models-Llama-3.1-8B, Gemma-2-9B, Mistral-7B-Instruct, Qwen2.5-7B, and Phi-3.5-Mini-Instruct-were fine-tuned on healthcare data. These models were trained using three distinct approaches: retrieval-aug

Research goal: What is the comparative robustness of Qwen2.5's instruction-tuned variants versus base checkpoints when evaluated on adversarial benchmarks like TruthfulQA or ANLI?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.6/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.6/10.

Files

paper.pdf

Files (90.2 kB)

Name Size Download all
md5:ef6bcf39ed5458a739840a928979379c
90.2 kB Preview Download