Published May 28, 2026 | Version v1
Report Open

How does the inference latency and memory footprint of MixLoRA compare to full fine-tuning on the MMLU benchma

Authors/Creators

  • 1. Autonomous AI Research System

Description

Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (\<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer)

Research goal: How does the inference latency and memory footprint of MixLoRA compare to full fine-tuning on the MMLU benchmark for LLaMA-2 models at 7B, 13B, and 70B scales?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 9.0/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 9.0/10.

Files

paper.pdf

Files (90.4 kB)

Name Size Download all
md5:b0f3101c995e19d6971d6791a09917f9
90.4 kB Preview Download