Persistence of Performance Gaps Between Layer-Specific LoRA and Full Fine-Tuning in Llama-3.2-3B Across Domain Shifts
Description
Large Language Models (LLMs) such as GPT-4 and LLaMA have demonstrated remarkable reasoning abilities but require significant computational resources for fine-tuning. This paper presents a resource-efficient fine-tuning approach for LLaMA-3.2-3B to enhance medical chain-of-thought reasoning while operating under constrained GPU and memory settings. Using parameter-efficient tuning techniques such as LoRA and QLoRA, we adapt the base model on publicly available medical reasoning datasets. The model achieves improved reasoning coherence and factual accuracy while reducing memory usage by up to 6
Research goal: To what extent does the performance gap between layer-specific LoRA injection and full fine-tuning in Llama-3.2-3B persist when evaluated on out-of-domain technical manuals compared to in-domain Kubernetes queries?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.2/10.
Notes
Files
paper.pdf
Files
(81.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:e78cfa1c0266cdb59bee7e516ab0877a
|
81.3 kB | Preview Download |