Memory-Driven Data-Flow Optimization for Neural Processing Accelerators

Nie, Qi

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01cf95jf42w

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Malik, Sharad	-
dc.contributor.author	Nie, Qi	-
dc.contributor.other	Electrical Engineering Department	-
dc.date.accessioned	2020-08-10T15:21:56Z	-
dc.date.available	2020-08-10T15:21:56Z	-
dc.date.issued	2020	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01cf95jf42w	-
dc.description.abstract	Neural processing applications are widely used in many fields like vision, speech recognition and language processing to realize artificial intelligence, but they are very computationally challenging given the large scale of data. The demand for high-performance, low-power computing of these applications has led to their implementation through specialized accelerators. With respect to both time and energy, memory access is much more expensive than computation. Thus, in accelerators, computation needs to exploit locality in SRAM to reduce DRAM requests as well as locality in datapath registers to reduce SRAM access. Neural applications generally include highly interleaved data reuse and this results in a working set which scales with the input size. However, the limited size and bandwidth of on-chip SRAM and registers are insufficient to provide local data storage and movement for the working set in large-scale computation. This results in significant expensive data movement across the memory hierarchy. To address this challenge, in this dissertation, I first define an optimization problem for minimizing data movement for a given application and architecture. The degrees of freedom (variables) in this optimization problem are: loop ordering, loop tiling and memory partitioning. The solution to this problem provides optimal values of these variables to maximize the data reuse at each level of memory. The design space of optimizing local memory utilization is large, and challenging to explore completely. For each point in the design space, I first provide analytical models to estimate its cost, i.e. the number of data movements across memory levels. I then investigate multiple techniques to prune the design space to find the optimal design efficiently. Finally, I extended to sparse scenarios as well where data distribution and reuse patterns are irregular. In summary, this thesis demonstrates the necessity of optimizing dataflow across the memory hierarchy to reduce expensive remote memory accesses and thus increase the performance and power efficiency of accelerators. It provides efficient solutions for mapping computation on to accelerators that gives optimized memory utilization. The efficacy of the work is validated through comparison with other computing platforms including CPU, GPU and mapping algorithms for state-of-the-art neural processing accelerators.	-
dc.language.iso	en	-
dc.publisher	Princeton, NJ : Princeton University	-
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>	-
dc.subject	Data flow optimization	-
dc.subject	Design space pruning	-
dc.subject	Memory constraints	-
dc.subject	Memory utilization	-
dc.subject	Neural network accelerators	-
dc.subject	Power efficiency	-
dc.subject.classification	Electrical engineering	-
dc.title	Memory-Driven Data-Flow Optimization for Neural Processing Accelerators	-
dc.type	Academic dissertations (Ph.D.)	-
Appears in Collections:	Electrical Engineering

Files in This Item:

File	Description	Size	Format
Nie_princeton_0181D_13278.pdf		4.84 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse