Memory-Driven Data-Flow Optimization for Neural Processing Accelerators

Nie, Qi

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01cf95jf42w

Title:	Memory-Driven Data-Flow Optimization for Neural Processing Accelerators
Authors:	Nie, Qi
Advisors:	Malik, Sharad
Contributors:	Electrical Engineering Department
Keywords:	Data flow optimization Design space pruning Memory constraints Memory utilization Neural network accelerators Power efficiency
Subjects:	Electrical engineering
Issue Date:	2020
Publisher:	Princeton, NJ : Princeton University
Abstract:	Neural processing applications are widely used in many fields like vision, speech recognition and language processing to realize artificial intelligence, but they are very computationally challenging given the large scale of data. The demand for high-performance, low-power computing of these applications has led to their implementation through specialized accelerators. With respect to both time and energy, memory access is much more expensive than computation. Thus, in accelerators, computation needs to exploit locality in SRAM to reduce DRAM requests as well as locality in datapath registers to reduce SRAM access. Neural applications generally include highly interleaved data reuse and this results in a working set which scales with the input size. However, the limited size and bandwidth of on-chip SRAM and registers are insufficient to provide local data storage and movement for the working set in large-scale computation. This results in significant expensive data movement across the memory hierarchy. To address this challenge, in this dissertation, I first define an optimization problem for minimizing data movement for a given application and architecture. The degrees of freedom (variables) in this optimization problem are: loop ordering, loop tiling and memory partitioning. The solution to this problem provides optimal values of these variables to maximize the data reuse at each level of memory. The design space of optimizing local memory utilization is large, and challenging to explore completely. For each point in the design space, I first provide analytical models to estimate its cost, i.e. the number of data movements across memory levels. I then investigate multiple techniques to prune the design space to find the optimal design efficiently. Finally, I extended to sparse scenarios as well where data distribution and reuse patterns are irregular. In summary, this thesis demonstrates the necessity of optimizing dataflow across the memory hierarchy to reduce expensive remote memory accesses and thus increase the performance and power efficiency of accelerators. It provides efficient solutions for mapping computation on to accelerators that gives optimized memory utilization. The efficacy of the work is validated through comparison with other computing platforms including CPU, GPU and mapping algorithms for state-of-the-art neural processing accelerators.
URI:	http://arks.princeton.edu/ark:/88435/dsp01cf95jf42w
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Electrical Engineering

Files in This Item:

File	Description	Size	Format
Nie_princeton_0181D_13278.pdf		4.84 MB	Adobe PDF	View/Download

Show full item record

Search

Browse