vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

Microsoft Research (India)

doi:10.5281/zenodo.14048693

Published May 7, 2024 | Version 0.0.1

Software Open

vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

Microsoft Research (India)

vAttention is a simple, performant and more portable dynamic memory manager for serving large language models. Leveraging CUDA support for demand paging, vAttention stores KV cache in contiguous virtual memory and uses on-demand allocation for physical memory. In doing so, we also introduce various LLM-specific optimizations to address the latency and fragmentation challenges that arise when using demand paging to serve LLMs on GPUs. vAttention supports various attention kernels out-of-the-box and significantly improves LLM serving throughput compared to using the state-of-the-art PagedAttention based kernels of FlashAttention and FlashInfer.

Files

vattention_artifact_asplos25.zip

Files (36.4 MB)

Name	Size	Download all
vattention_artifact_asplos25.zip md5:06138e79a3269baa77179b3e99a02f74	36.4 MB	Preview Download

Additional details

Repository URL: https://github.com/microsoft/vattention
Programming language: Python
Development Status: Active

167

Views

Downloads

Show more details

	All versions	This version
Views	167	167
Downloads	12	12
Data volume	436.3 MB	436.3 MB

More info on how stats are collected....

DOI

Resource type

Software

Publisher

Zenodo

Conference

Architectural Support for Programming Languages and Operating Systems (ASPLOS) , Rotterdam, The Netherlands, 30 March to 3 April 2025

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 7, 2024
Modified: November 7, 2024

vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

Authors/Creators

Description

Files

vattention_artifact_asplos25.zip

Files (36.4 MB)

Additional details

Software