IRIS: Interference and Resource Aware Predictive Orchestration for ML Inference Serving

Aggelos Ferikoglou; Panos Chrysomeris; Achilleas Tzenetopoulos; Manolis Katsaragakis; DImosthenis Masouros; Dimitrios Soudris

doi:10.5281/zenodo.8135716

Published July 11, 2023 | Version v1

Conference paper Open

IRIS: Interference and Resource Aware Predictive Orchestration for ML Inference Serving

1. National Technical University of Athens, Greece

Over the last years, the ever-growing number of Machine Learning(ML) and Artificial Intelligence(AI) applications deployed in the Cloud has led to high demands on the computing resources required for efficient processing. Multiple users deploy multiple applications on the same server node to maximize Quality of Service(QoS); however, this leads to increased interference. In addition, Cloud providers aim to minimize their operating costs by efficiently utilizing the available resources. These conflicting optimization goals form a complex paradigm where efficient scheduling is required.

In this work, we present IRIS, an interference- and resource-aware predictive inference scheduling framework for ML inference serving in the cloud. We target the multi-objective problem of QoS maximization with effective CPU utilization based on Queries per Second(QPS) predictions by proposing a model-less ML-based solution and integrating it into the Kubernetes platform. Our approach is evaluated over real hardware infrastructure and a set of ML applications. Our experimental analysis shows that under various QoS constraints, the model-specific interference-aware scheduler violates QoS constraints less frequently by achieving 1.8x fewer violations, on average, compared to over-provisioning and 3.1x fewer violations compared to under-provisioning, through efficient exploitation of available CPU resources. The model-less feature is able to cause, on average, 1.5x fewer violations compared to the model-specific scheduler, while further reducing the average CPU utilization by ~30%.

Files

2023_IEEECLOUD_IRIS_Interference_and_Resource_Aware_Predictive_Orchestration_for_ML_Inference_Serving.pdf

Files (765.6 kB)

Name	Size	Download all
2023_IEEECLOUD_IRIS_Interference_and_Resource_Aware_Predictive_Orchestration_for_ML_Inference_Serving.pdf md5:2dcb69795183da26c68ed3235cd01205	765.6 kB	Preview Download

Additional details

European Commission
NEPHELE - A LIGHTWEIGHT SOFTWARE STACK AND SYNERGETIC META-ORCHESTRATION FRAMEWORK FOR THE NEXT GENERATION COMPUTE CONTINUUM 101070487

	All versions	This version
Views	415	411
Downloads	520	486
Data volume	410.4 MB	384.4 MB

IRIS: Interference and Resource Aware Predictive Orchestration for ML Inference Serving

Authors/Creators

Description

Files

2023_IEEECLOUD_IRIS_Interference_and_Resource_Aware_Predictive_Orchestration_for_ML_Inference_Serving.pdf

Files (765.6 kB)

Additional details

Funding