Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk4087qj7d
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorAugust, David
dc.contributor.authorNagendra, Nayana Prasad
dc.contributor.otherComputer Science Department
dc.date.accessioned2021-10-04T13:25:51Z-
dc.date.available2021-10-04T13:25:51Z-
dc.date.created2021-01-01
dc.date.issued2021
dc.identifier.urihttp://arks.princeton.edu/ark:/99999/fk4087qj7d-
dc.description.abstractCloud computing is the backbone of today’s digital world. Billions of devices ranging from internet-of-things to mobile phones connect to services running on massive data centers, dubbed as Warehouse Scale Computers (WSCs). Due to the scale at which WSCs operate, their performance has a huge cost impact. Even the slightest percentage improvement in performance generates significant profits for the companies that operate them. Besides, they have a hugely adverse effect on the environment because of their massive carbon footprint. Surprisingly, our work with Google shows significant inefficiencies among the WSC processor utilization. More than two-thirds of the CPU cycles are wasted in many forms. Specifically, nearly 20% of the wasted cycles come from the processor front-end, which is responsible for keeping the pipeline fed with useful instructions. Since a substantial fraction of the front-end cycle wastages occurs due to instruction cache misses, this Dissertation has taken a full-stack approach to study and solve those.There have been vast growths in data center workloads and their complexity. Consequently, programs’ working set sizes are much larger than those of the instruction caches in modern processors. With the nearing end of Moore’s Law, innovations are necessary to increase the efficiency of instruction cache usage, given the growing workloads. In this Dissertation, we first present AsmDB, a fleet-wide study of Google workloads to understand processor front-end bottlenecks. Based on our knowledge, this is the first work to study the in-depth behavior of instruction caches at such a large scale. Among its many findings, AsmDB highlights inefficient usage of instruction caches because of significant fragmentation even among the highly-optimized code. The second part of this Dissertation describes Emissary, a new approach to alleviate the said problem by using the available cache capacity more efficiently. Observing that not all cache misses cause stalls, Emissary employs a novel cache replacement algorithm that reserves cache capacity for lines whose misses cause the most significant performance impact. Emissary outperforms all known cache replacement algorithms in performance and energy consumption. In some cases, it produces speedups that exceed Belady’s OPT, the perfect-knowledge minimal miss ideal cache replacement algorithm.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherPrinceton, NJ : Princeton University
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu>catalog.princeton.edu</a>
dc.subject.classificationComputer engineering
dc.titleIMPROVING INSTRUCTION CACHE PERFORMANCE FOR MODERN PROCESSORS WITH GROWING WORKLOADS
dc.typeAcademic dissertations (Ph.D.)
pu.date.classyear2021
pu.departmentComputer Science
Appears in Collections:Computer Science

Files in This Item:
File SizeFormat 
Nagendra_princeton_0181D_13771.pdf6.15 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.