Journal article Open Access
Khaddam-Aljameh, Riduan; Francese, Pier-Andrea; Benini, Luca; Eleftheriou, Evangelos
A novel interleaved switched-capacitor and SRAM-based multibit matrix-vector multiply-accumulate engine for in-memory computing is presented. Its operation principle is based on first converting an SRAM-stored n-bit weight into a proportional voltage using a pipeline D/A converter built from n+1 equally sized stages. A switched-capacitor stage then multiplies these voltages with an m-bit digital input activation. Finally, the output voltages that correspond to the different multiplication results are accumulated along one column by means of charge-sharing. With our proposed architecture, the required circuit area, computation time, and power consumption scale linearly versus the bit resolution of both the inputs and the weights. Analytical formulas are presented for the energy consumption in both capacitors and switches. Moreover, the impact of fabrication mismatch on analog computation accuracy is examined. The full system architecture is described, and the feasibility is demonstrated, via a full macro implementation study in 14 nm, detailing area and energy consumption, as well as the overall latency. Finally, a specific design of a 128 × 2048 6 -bit weight and 6-bit input signed matrix-vector multiplication accelerator system in 14 nm is presented, which runs at 2.43 TOP/s at an efficiency of 16.94 TOP/s/W, while using the nominal supply voltage of 0.8 V. If the operands' precision is considered in the metric, then the efficiency becomes 609.7 TOP/s/W.