Published January 22, 2024 | Version v1
Publication Open

Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge


Neural Architecture Search (NAS) has become the de-facto approach for designing accurate and efficient networks for edge devices. Since models are typically quantized for edge deployment, recent work has investigated quantization-aware NAS (QA-NAS) to search for highly accurate and efficient quantized models. However, existing QA-NAS approaches, particularly few-bit mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently, QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS. We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.


CODAI ARXIV 2401.12350.pdf

Files (656.7 kB)

Name Size Download all
656.7 kB Preview Download

Additional details



Edge AI Technologies for Optimised Performance Embedded Processing 101097300
European Commission