How does the inference latency and memory footprint of MixLoRA compare to full fine-tuning on the MMLU benchma

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20435241

Published May 28, 2026 | Version v1

Report Open

How does the inference latency and memory footprint of MixLoRA compare to full fine-tuning on the MMLU benchma

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (\<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer)

Research goal: How does the inference latency and memory footprint of MixLoRA compare to full fine-tuning on the MMLU benchmark for LLaMA-2 models at 7B, 13B, and 70B scales?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 9.0/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 9.0/10.

Files

paper.pdf

Files (90.4 kB)

Name	Size	Download all
paper.pdf md5:b0f3101c995e19d6971d6791a09917f9	90.4 kB	Preview Download

	All versions	This version
Views	3	3
Downloads	1	1
Data volume	90.4 kB	90.4 kB

How does the inference latency and memory footprint of MixLoRA compare to full fine-tuning on the MMLU benchma

Authors/Creators

Description

Notes

Files

paper.pdf

Files (90.4 kB)