Enhancing GPU-HBM Data Transfer Efficiency Using Markov Chains and Neural Network-Driven Predictive Caching with Quantization and Pruning

Samiel Azmaien

doi:10.35940/ijsce.F3700.15060126

Published January 30, 2026 | Version CC-BY-NC-ND 4.0

Journal article Open

Enhancing GPU-HBM Data Transfer Efficiency Using Markov Chains and Neural Network-Driven Predictive Caching with Quantization and Pruning

Samiel Azmaien (Contact person)¹

1. Research Assistant, Department of Computer Science, Georgia Institute of Technology, Atlanta (Georgia), United States of America (USA).

Abstract: Background High-bandwidth memory (HBM) systems face persistent data transfer bottlenecks, particularly when CPUs are unable to supply data to GPUs at a sufficient rate. This limitation reduces overall computational efficiency and highlights the need for improved cache management strategies. Methods: Markov Chains represented transitions between frequently accessed memory blocks, enabling predictive sequencing of data needs. A neural network was then applied to model and optimise these Markov transitions, improving cache prefetching accuracy and further optimising data movement techniques. Results & Conclusions: The combined use of Markov-based memory modelling, NN optimisation, and supplementary data transfer techniques demonstrates strong potential to mitigate CPU–GPU bandwidth limitations. Together, these methods offer more efficient cache utilization and reduced bottlenecks in high-demand computational environments.

Files

F370015060126.pdf

Files (902.7 kB)

Name	Size	Download all
F370015060126.pdf md5:074f2cd18d7ed160ff176ded7ced0f2b	902.7 kB	Preview Download

Additional details

DOI: 10.35940/ijsce.F3700.15060126
EISSN: 2231-2307

Accepted: 2026-01-15

Manuscript received on 19 November 2025 | First Revised Manuscript received on 29 November 2025 | Second Revised Manuscript received on 08 December 2025 | Manuscript Accepted on 15 January 2026 | Manuscript published on 30 January 2026

Joseph, D., & Grunwald, D. (2002, August 06). Prefetching using Markov predictors. IEEE Journals & Magazine. DOI: https://doi.org/10.1109/12.75265, works remain significant, see the declaration
Jog, A., Kayiran, O., Mishra, A. K., Kandemir, M. T., Mutlu, O., Iyer, R., & Das, C. R. (2013, June 23). Orchestrated scheduling and prefetching for GPGPUs. Association for Computing Memory. DOI: https://doi.org/10.1145/2485922.2485951, works remain significant, see the declaration
Bauer, M., Cook, H., & Khailany, B. (2011, November 12). CUDADMA: Optimizing GPU memory bandwidth via warp specialization. Association for Computing Machinery. DOI: https://doi.org/10.1145/2063384.2063400, works remain significant, see the declaration
Liang, T., Glossner, J., Wang, L., Shi, S., & Zhang, X. (2021, January 24). Pruning and quantization for deep neural network acceleration: a survey. arXiv.org. DOI: https://doi.org/10.48550/arXiv.2101.09671
Shi, Z., Huang, X., Jain, A., & Lin, C. (2019, October 12). Applying deep learning to the cache replacement problem. Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. DOI: https://doi.org/10.1145/3352460.3358319
Chopra, B. (2024, May 7). Enhancing machine learning Performance: the role of GPU-based AI computer architectures. Journal of Knowledge Learning and Science Technology ISSN 2959-6386 (Online), 3(3), 20 32. DOI: https://doi.org/10.60087/jklst.vol3.n3.p20-32
Hou, J., Tao, T., Lu, H., & Nayak, A. (2023, June 22). Intelligent caching with graph neural network-based deep reinforcement learning on SDN-based ICN. Future Internet, 15(8), 251. DOI: https://doi.org/10.3390/fi15080251
Bakhoda, A., Yuan, G. L., Fung, W. W. L., Wong, H., & Aamodt, T. M. (2009, April 1). Analyzing CUDA workloads using a detailed GPU simulator. IEEE Conference Publication. DOI: https://doi.org/10.1109/ISPASS.2009.4919648
Liu, A., & Tucker, A. (1988). Applied Combinatorics. DOI: https://doi.org/10.1137/1030075, works remain significant, see the declaration
Mittal, S. (2015, January 16). A survey of techniques for managing and leveraging caches in GPUs. Journal of Circuits, Systems and Computers, 23(08), 1430002. DOI: https://doi.org/10.1142/s0218126614300025, works remain significant, see the declaration

	All versions	This version
Views	7	7
Downloads	6	6
Data volume	6.3 MB	6.3 MB

F370015060126.pdf

Files (902.7 kB)

Identifiers

Dates

References

Enhancing GPU-HBM Data Transfer Efficiency Using Markov Chains and Neural Network-Driven Predictive Caching with Quantization and Pruning

Authors/Creators

Description

Files

F370015060126.pdf

Files (902.7 kB)

Additional details

Identifiers

Dates

References