Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01cf95jf42w
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorMalik, Sharad-
dc.contributor.authorNie, Qi-
dc.contributor.otherElectrical Engineering Department-
dc.date.accessioned2020-08-10T15:21:56Z-
dc.date.available2020-08-10T15:21:56Z-
dc.date.issued2020-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01cf95jf42w-
dc.description.abstractNeural processing applications are widely used in many fields like vision, speech recognition and language processing to realize artificial intelligence, but they are very computationally challenging given the large scale of data. The demand for high-performance, low-power computing of these applications has led to their implementation through specialized accelerators. With respect to both time and energy, memory access is much more expensive than computation. Thus, in accelerators, computation needs to exploit locality in SRAM to reduce DRAM requests as well as locality in datapath registers to reduce SRAM access. Neural applications generally include highly interleaved data reuse and this results in a working set which scales with the input size. However, the limited size and bandwidth of on-chip SRAM and registers are insufficient to provide local data storage and movement for the working set in large-scale computation. This results in significant expensive data movement across the memory hierarchy. To address this challenge, in this dissertation, I first define an optimization problem for minimizing data movement for a given application and architecture. The degrees of freedom (variables) in this optimization problem are: loop ordering, loop tiling and memory partitioning. The solution to this problem provides optimal values of these variables to maximize the data reuse at each level of memory. The design space of optimizing local memory utilization is large, and challenging to explore completely. For each point in the design space, I first provide analytical models to estimate its cost, i.e. the number of data movements across memory levels. I then investigate multiple techniques to prune the design space to find the optimal design efficiently. Finally, I extended to sparse scenarios as well where data distribution and reuse patterns are irregular. In summary, this thesis demonstrates the necessity of optimizing dataflow across the memory hierarchy to reduce expensive remote memory accesses and thus increase the performance and power efficiency of accelerators. It provides efficient solutions for mapping computation on to accelerators that gives optimized memory utilization. The efficacy of the work is validated through comparison with other computing platforms including CPU, GPU and mapping algorithms for state-of-the-art neural processing accelerators.-
dc.language.isoen-
dc.publisherPrinceton, NJ : Princeton University-
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>-
dc.subjectData flow optimization-
dc.subjectDesign space pruning-
dc.subjectMemory constraints-
dc.subjectMemory utilization-
dc.subjectNeural network accelerators-
dc.subjectPower efficiency-
dc.subject.classificationElectrical engineering-
dc.titleMemory-Driven Data-Flow Optimization for Neural Processing Accelerators-
dc.typeAcademic dissertations (Ph.D.)-
Appears in Collections:Electrical Engineering

Files in This Item:
File Description SizeFormat 
Nie_princeton_0181D_13278.pdf4.84 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.