Hardware-aware Training for In-memory Computing Systems

Zhang, Bonan

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk4fb6sf86

Title:	Hardware-aware Training for In-memory Computing Systems
Authors:	Zhang, Bonan
Advisors:	Verma, Naveen
Contributors:	Electrical and Computer Engineering Department
Subjects:	Computer engineering
Issue Date:	2025
Publisher:	Princeton, NJ : Princeton University
Abstract:	In-memory Computing (IMC) is an emerging approach to address compute and data-movement costs inherent in high-dimensional matrix-vector multiplies (MVM), which are crucial operations in modern deep learning models. However, IMC suffers from various noise sources, which can be classified as 1) analog noise, predominant in low-SNR IMC systems; and 2) quantization noise which dominants in high-SNR IMC systems. These unavoidable noise sources in practical hardware lead to a degradation in deep learning inference performance. This dissertation delves into approaches to enable high performance deep learning for IMC systems affected by various types of noise. The research includes both algorithmic techniques and co-design strategies with hardware modeling. The initial attempt to tackle analog noise involves a co-design with algorithmic techniques, proposing an approach named Stochastic Data-Driven Hardware Resilience (S-DDHR). This approach integrates the stochastic nature of hardware into the training process, achieved by formulating a statistical model capturing the hardware variations. The focus lies particularly on process variations, as they constitute the primary source of analog noise in advanced silicon technologies. An MRAM-based IMC architecture is introduced to evaluate the S-DDHR method, where the variation parameters are extracted and modeled based on foundry data. The evaluation spans various bit-precisions and datasets. In order to fully recover the performance degradation due to analog noise on practical energy/throughput-aggressive IMC systems, particularly on IMC with emerging memory technologies that exhibit a low SNR at the compute output, an enhanced statistical framework is developed. It consists of a contrastive and progressive training algorithm aimed to enhance the model robustness against hardware noise, along with a macro-level modeling approach of analog circuit noise. The noise parameters for this modeling can be derived from a limited number of hardware measurements and calibrations. The effectiveness of this framework is tested on practical MRAM-based IMC prototype chips in 22nm FD-SOI, successfully demonstrating on-chip deep learning inference across multiple tasks and bit-precisions. Finally, the challenge of quantization noise for high-SNR IMC systems is addressed by exploring purely algorithmic methods. The primary challenge of quantization in IMC systems lies in the additional ADC quantization introduced to each compute output. This is particularly critical in SRAM-based IMC, which is becoming promising due to its robustness and scalability. An approach named Reshape and Adapt for Output Quantization (RAOQ), is proposed, including a reshaping technique for neural network weights and a shift approach for activations, to reform their statistics and improve the degraded SQNR due to ADC quantization; and a bit augmentation method to aid the optimization process of model parameters, as well as an ADC-LoRA technique to reduce the training overhead. RAOQ is evaluated across large-scale models for a wide range of AI tasks and bit-precisions.
URI:	http://arks.princeton.edu/ark:/99999/fk4fb6sf86
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Electrical Engineering

Files in This Item:

File	Size	Format
Zhang_princeton_0181D_15359.pdf	3.62 MB	Adobe PDF	View/Download

Show full item record

Search

Browse