Learning to Learn Optimally: A Practical Framework for Machine Learning Applications with Finite Time Horizon

Lee, Donghun

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01zw12z8183

Title:	Learning to Learn Optimally: A Practical Framework for Machine Learning Applications with Finite Time Horizon
Authors:	Lee, Donghun
Advisors:	Powell, Warren B
Contributors:	Computer Science Department
Keywords:	Artificial Intelligence Learning to Learn Optimally Machine Learning Meta Learning
Subjects:	Computer science
Issue Date:	2019
Publisher:	Princeton, NJ : Princeton University
Abstract:	Most machine learning algorithms with asymptotic guarantees leave finite time horizon issues such as initialization or tuning open to the end users, to whom the burden may cause undesirable outcome in practice where finite time horizon performance matters. As an inspirational case of the undesirable finite time behavior, we identify the finite time bias in Q-learning algorithm and present a method to alleviate the bias on-the-fly. Motivated by the gap between the asymptotic guarantees and the practical burdens of machine learning, we investigate the problem of learning to learn, defined as the problem of learning how to apply a given machine learning algorithm to solve a given task with a finite time horizon objective function. To address the problem more generally, we develop the framework of \emph{learning to learn optimally} (LTLO), which models the problem of optimal application of a machine learning algorithm to a given task in a finite horizon. We demonstrate the use of the LTLO framework as a modeling tool for a real world problem via an example of learning to learn how to bid in sponsored search auctions. We show the practical benefit of using the LTLO framework as a baseline to construct meta-LQKG+, a knowledge gradient based LTLO algorithm designed to solve online hyperparameter optimization approximately with a few number of trials, and demonstrate the practical sample efficiency of the algorithm. Answering to the need for a robust anytime LTLO algorithm, we develop online regularized knowledge gradient policy, which solves the problem of LTLO with high probability and has a sublinear regret bound.
URI:	http://arks.princeton.edu/ark:/88435/dsp01zw12z8183
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Computer Science

Files in This Item:

File	Description	Size	Format
Lee_princeton_0181D_12961.pdf		1.77 MB	Adobe PDF	View/Download

Show full item record

Search

Browse