Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01zw12z8183
Title: | Learning to Learn Optimally: A Practical Framework for Machine Learning Applications with Finite Time Horizon |
Authors: | Lee, Donghun |
Advisors: | Powell, Warren B |
Contributors: | Computer Science Department |
Keywords: | Artificial Intelligence Learning to Learn Optimally Machine Learning Meta Learning |
Subjects: | Computer science |
Issue Date: | 2019 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | Most machine learning algorithms with asymptotic guarantees leave finite time horizon issues such as initialization or tuning open to the end users, to whom the burden may cause undesirable outcome in practice where finite time horizon performance matters. As an inspirational case of the undesirable finite time behavior, we identify the finite time bias in Q-learning algorithm and present a method to alleviate the bias on-the-fly. Motivated by the gap between the asymptotic guarantees and the practical burdens of machine learning, we investigate the problem of learning to learn, defined as the problem of learning how to apply a given machine learning algorithm to solve a given task with a finite time horizon objective function. To address the problem more generally, we develop the framework of \emph{learning to learn optimally} (LTLO), which models the problem of optimal application of a machine learning algorithm to a given task in a finite horizon. We demonstrate the use of the LTLO framework as a modeling tool for a real world problem via an example of learning to learn how to bid in sponsored search auctions. We show the practical benefit of using the LTLO framework as a baseline to construct meta-LQKG+, a knowledge gradient based LTLO algorithm designed to solve online hyperparameter optimization approximately with a few number of trials, and demonstrate the practical sample efficiency of the algorithm. Answering to the need for a robust anytime LTLO algorithm, we develop online regularized knowledge gradient policy, which solves the problem of LTLO with high probability and has a sublinear regret bound. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01zw12z8183 |
Alternate format: | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu |
Type of Material: | Academic dissertations (Ph.D.) |
Language: | en |
Appears in Collections: | Computer Science |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Lee_princeton_0181D_12961.pdf | 1.77 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.