How does the inference latency and memory footprint of MixLoRA compare to full fine-tuning on the MMLU benchma
Description
Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (\<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer)
Research goal: How does the inference latency and memory footprint of MixLoRA compare to full fine-tuning on the MMLU benchmark for LLaMA-2 models at 7B, 13B, and 70B scales?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 9.0/10.
Notes
Files
paper.pdf
Files
(90.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:b0f3101c995e19d6971d6791a09917f9
|
90.4 kB | Preview Download |