Report Open Access
Decades after being initially explored in the 1970s, Processing in Memory (PIM) is currently experiencing a renaissance. By moving part of the computation to the memory devices, PIM addresses a fundamental issue in the design of modern computing systems, the mismatch between the von Neumann architecture and the requirements of important data-centric applications. A number of industrial prototypes and products are under development or already available in the marketplace, and these devices show the potential for cost-effective and energy-efficient acceleration of HPC, AI and data analytics workloads. This paper reviews the reasons for the renewed interest in PIM and surveys industrial prototypes and products, discussing their technological readiness.
Wide adoption of PIM in production, however, depends on our ability to create an ecosystem to drive and coordinate innovations and co-design across the whole stack. European companies and research centres should be involved in all aspects, from technology, hardware, system software and programming environment, to updating of the algorithm and application. In this paper, we identify the main challenges that must be addressed and we provide guidelines to prioritise the research efforts and funding. We aim to help make PIM a reality in production HPC, AI and data analytics.
M. Radulovic, D. Zivanovic, D. Ruiz, B. R. d. Supinski, S. A. McKee, P. Radojković and E. Ayguadé, "Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC?," in Proceedings of the International Symposium on Memory Systems (MEMSYS), 2015.
H. S. Stone, "A Logic-in-Memory Computer," IEEE Transactions on Computers, vol. 19, 1970.
P. Siegl, R. Buchty and M. Berekovic, "Data-Centric Computing Frontiers: A Survey on Processing-In-Memory," in Proceedings of the Second International Symposium on Memory Systems (MEMSYS), 2016.
Eurolab4HPC Long-Term Vision on High-Performance Computing (2nd Edition), 2020.
ETP4HPC's SRA 4, "Strategic Research Agenda for High-performance Computing in Europe," White Paper, 2020.
Samsung Electronics Co., Ltd., "288pin Registered DIMM based on 4Gb E-die," DDR4 SDRAM Datasheet, 2017.
S. Li, Z. Yang, D. Reddy, A. Srivastava and B. Jacob, "DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator," IEEE Computer Architecture Letters, vol. 19, 2020.
M. Radulovic, K. Asifuzzaman, D. Zivanovic, N. Rajovic, G. C. d. Verdiére, D. Pleiter, M. Marazakisl, N. Kallimanis, P. Carpenter, P. Radojković and E. Ayguadé, "Mainstream vs. Emerging HPC: Metrics, Trade-Offs and Lessons Learned," in Proceedings of 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2018.
O. Mutlu, S. Ghose, J. Gómez-Luna and R. Ausavarungnirun, "A Modern Primer on Processing in Memory," in arXiv, 2020.
K. Wang, K. Angstadt, C. Bo, N. Brunelle, E. Sadredini, T. Tracy, J. Wadden, M. Stan and K. Skadron, "An Overview of Micron's Automata Processor," in Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES), 2016.
P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal and H. Noyes, "An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing," IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 12, 2014.
T. Finkbeiner, G. Hush, T. Larsen, P. Lea, J. Leidel and T. Manning, "In-Memory Intelligence," IEEE Micro, vol. 37, no. 4, 2017.
F. Devaux, "The True Processing in Memory Accelerator," IEEE Hot Chips Symposium (HCS), 2019.
J. Jeddeloh and B. Keeth, "Hybrid Memory Cube New DRAM Architecture Increases Density and Performance," in Proceedings of Symposium on VLSI Technology (VLSIT), 2012.
JEDEC Solid State Technology Association, "High Bandwidth Memory (HBM) DRAM," White Paper, 2013.
J. Jeffers, J. Reinders and a. A. Sodani, Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition (2nd ed.), 2016.
FUJITSU LIMITED, "FUJITSU Supercomputer PRIMEHPC Specifications," White Paper, 2020.
FUJITSU LIMITED, "FUJITSU Supercomputer PRIMEHPC FX1000," White Paper, 2020.
Y. Kwon et al., "A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications," in Proceedings of IEEE International Solid-State Circuits Conference (ISSCC), 2021.
J.-P. Noel, M. Pezzin, R. Gauchi, J.-F. Christmann, M. Kooli, H.-P. Charles, L. Ciampolini, M. Diallo, F. Lepin, B. Blampey, P. Vivet, S. Mitra and B. Giraud, "A 35.6 TOPS/W/mm² 3-Stage Pipelined Computational SRAM with Adjustable Form Factor for Highly Data-Centric Applications," IEEE Solid-State Circuits Letters, vol. 3, 2020.
M. Kooli, H.-P. Charles, C. Touzet, B. Giraud and J.-P. Noel, "Smart Instruction Codes for In-Memory Computing Architectures Compatible with Standard SRAM Interfaces," in Proceedings of Design, Automation Test in Europe Conference Exhibition (DATE), 2018.
R. Khaddam-Aljameh, P.-A. Francese, L. Benini and E. Eleftheriou, "An SRAM-Based Multibit In-Memory Matrix-Vector Multiplier with a Precision that Scales Linearly in Area, Time, and Power," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 29, 2021.
R. Khaddam-Aljameh et al., "HERMES Core – A 14nm CMOS and PCM-based In-Memory Compute Core using an array of 300ps/LSB Linearized CCO-based ADCs and local digital processing," in Proc. Symposium on VLSI Circuits, 2021.
A. Sebastian, M. L. Gallo, R. Khaddam-Aljameh and E. Eleftheriou, "Memory devices and applications for in-memory computing," Nature Nanotechnology, no. July, p. 529–544, 2020.
M. Giordano, K. Prabhu, K. Koul, R. M. Radway, A. Gural, R. Doshi, Z. F. Khan, J. W. Kustin, T. Liu, G. B. Lopes, V. Turbiner, W.-S. Khwa, Y.-D. Chih, M.-F. Chang, G. Lallement, B. Murmann, S. Mitra and P. Raina, "CHIMERA: A 0.92 TOPS, 2.2 TOPS/W Edge AI Accelerator with 2 MByte On-Chip Foundry Resistive RAM for Efficient Training and Inference," Symposium on VLSI Circuits (VLSI), 2021.
A. Valentian, F. Rummens, E. Vianello, T. Mesquida, C. L.-M. d. Boissac, O. Bichler and C. Reita, "Fully Integrated Spiking Neural Network with Analog Neurons and RRAM Synapses," IEEE International Electron Devices Meeting (IEDM), pp. 14.3.1-14.3.4, 2019.
Microchip Technology Inc., "Enhancing System Architecture Implementation for AI Applications, Microchip Delivers its Analog Embedded SuperFlash Technology," News Release, 2019.
Micron Technology, Inc., "ECC Brings Reliability and Power Efficiency to Mobile Devices," White Paper, 2017.
M. Patel, J. S. Kim, H. Hassan and O. Mutlu, "Understanding and Modeling On-Die Error Correction in Modern DRAM: An Experimental Study Using Real Devices," in Proceedings of 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2019.
P. Amato, C. Laurent, M. Sforzin, S. Bellini, M. Ferrari and A. Tomasoni, "Ultra fast, two-bit ECC for Emerging Memories," in IEEE 6th International Memory Workshop (IMW), 2014.
D. D. Sharma, "Compute express link," White Paper, 2019.
B. Benton, "CCIX, GEN-Z, OpenCAPI: Overview and Comparison.," White Paper, 2017.
Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai and O. Mutlu, "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors," in Proceedings of ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), 2014.
UPMEM, "UPMEM PIM Security Benefits - Architecture and Features Overview," White Paper, 2020.
P. Radojković et al., "Towards Resilient EU HPC Systems: A Blueprint," European HPC resilience initiative, 2020.