Comparative Robustness of Qwen2.5 Instruction-Tuned Variants Versus Base Checkpoints on Adversarial Benchmarks
Description
Large language models (LLMs) are trained on huge datasets, which allow them to answer questions from various domains. However, their expertise is confined to the data that they were trained on. In order to specialize LLMs in niche domains like healthcare, various training methods can be employed. Two of these commonly known approaches are retrieval-augmented Generation and model fine-tuning. Five models-Llama-3.1-8B, Gemma-2-9B, Mistral-7B-Instruct, Qwen2.5-7B, and Phi-3.5-Mini-Instruct-were fine-tuned on healthcare data. These models were trained using three distinct approaches: retrieval-aug
Research goal: What is the comparative robustness of Qwen2.5's instruction-tuned variants versus base checkpoints when evaluated on adversarial benchmarks like TruthfulQA or ANLI?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.6/10.
Notes
Files
paper.pdf
Files
(90.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:ef6bcf39ed5458a739840a928979379c
|
90.2 kB | Preview Download |