PCIe and CXL Interconnects for AI Accelerators: Performance, Latency, and Telemetry

Phani Suresh Paladugu

doi:10.5281/zenodo.18813051

Published February 25, 2026 | Version v1

Journal article Open

PCIe and CXL Interconnects for AI Accelerators: Performance, Latency, and Telemetry

Phani Suresh Paladugu¹

1. Synopsys, USA

The exponential growth of artificial intelligence and high-performance computing workloads has fundamentally transformed system design priorities, shifting performance bottlenecks from computational resources to interconnect infrastructure. Modern AI accelerators demand unprecedented bandwidth and predictable latency characteristics that challenge traditional interconnect technologies, particularly in heterogeneous computing environments where processors, accelerators, and memory expansion devices must communicate efficiently across complex fabric topologies. This article presents a unified framework for characterizing and optimizing PCIe 6.0 and CXL 3.0 interconnect fabrics, addressing critical challenges in latency predictability, throughput maximization, and operational observability. Through comprehensive modeling of protocol stack behaviors, physical layer characteristics, and multi-level switching architectures, the article quantifies end-to-end latency contributors including forward error correction overhead, credit-based flow control delays, and switch traversal costs. A telemetry-driven runtime framework integrates PCIe Advanced Error Reporting and CXL Fabric Manager interfaces to enable adaptive optimization policies encompassing credit-aware scheduling, dynamic link management, intelligent memory tiering, and energy-efficient controller operation. Machine learning classifiers built on historical telemetry data enable predictive maintenance capabilities that identify degrading links before service disruptions occur. Experimental validation across transformer training, large language model inference, and representative scientific computing kernels demonstrates substantial improvements in tail latency, aggregate throughput, and energy efficiency. The article provides practical guidance for fabric architects designing next-generation disaggregated computing infrastructures while identifying critical challenges and opportunities in scaling these approaches to hyper-scale deployments.

Files

final+4956.pdf

Files (752.3 kB)

Name	Size	Download all
final+4956.pdf md5:635fe9923b3df2634206f4dbcb393259	752.3 kB	Preview Download

	All versions	This version
Views	24	24
Downloads	3	3
Data volume	2.3 MB	2.3 MB

PCIe and CXL Interconnects for AI Accelerators: Performance, Latency, and Telemetry

Authors/Creators

Description

Files

final+4956.pdf

Files (752.3 kB)