Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01zw12z8183
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorPowell, Warren B-
dc.contributor.authorLee, Donghun-
dc.contributor.otherComputer Science Department-
dc.date.accessioned2019-12-12T17:21:28Z-
dc.date.available2021-11-04T16:54:17Z-
dc.date.issued2019-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01zw12z8183-
dc.description.abstractMost machine learning algorithms with asymptotic guarantees leave finite time horizon issues such as initialization or tuning open to the end users, to whom the burden may cause undesirable outcome in practice where finite time horizon performance matters. As an inspirational case of the undesirable finite time behavior, we identify the finite time bias in Q-learning algorithm and present a method to alleviate the bias on-the-fly. Motivated by the gap between the asymptotic guarantees and the practical burdens of machine learning, we investigate the problem of learning to learn, defined as the problem of learning how to apply a given machine learning algorithm to solve a given task with a finite time horizon objective function. To address the problem more generally, we develop the framework of \emph{learning to learn optimally} (LTLO), which models the problem of optimal application of a machine learning algorithm to a given task in a finite horizon. We demonstrate the use of the LTLO framework as a modeling tool for a real world problem via an example of learning to learn how to bid in sponsored search auctions. We show the practical benefit of using the LTLO framework as a baseline to construct meta-LQKG+, a knowledge gradient based LTLO algorithm designed to solve online hyperparameter optimization approximately with a few number of trials, and demonstrate the practical sample efficiency of the algorithm. Answering to the need for a robust anytime LTLO algorithm, we develop online regularized knowledge gradient policy, which solves the problem of LTLO with high probability and has a sublinear regret bound.-
dc.language.isoen-
dc.publisherPrinceton, NJ : Princeton University-
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>-
dc.subjectArtificial Intelligence-
dc.subjectLearning to Learn Optimally-
dc.subjectMachine Learning-
dc.subjectMeta Learning-
dc.subject.classificationComputer science-
dc.titleLearning to Learn Optimally: A Practical Framework for Machine Learning Applications with Finite Time Horizon-
dc.typeAcademic dissertations (Ph.D.)-
pu.embargo.terms2021-06-10-
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Lee_princeton_0181D_12961.pdf1.77 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.