New Exploration and Identification schemes for Linear Dynamics: the cases of Convex costs and Censored Data

Plevrakis, Orestis

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk41z5p59s

Title:	New Exploration and Identification schemes for Linear Dynamics: the cases of Convex costs and Censored Data
Authors:	Plevrakis, Orestis
Advisors:	Arora, Sanjeev
Contributors:	Computer Science Department
Subjects:	Computer science Statistics
Issue Date:	2021
Publisher:	Princeton, NJ : Princeton University
Abstract:	Adaptive control, i.e., learning to control an unknown linear dynamical system, has a long history in statistics and control theory. However, sharp non-asymptotic bounds were only recently obtained, and are outputs of an ongoing research program aiming to apply tools from learning theory and optimization, to control theory. A common feature of all the major results of the program has been the learning-part of the proposed algorithms, i.e., the exploration strategy. All these algorithms apply the so-called "certainty-equivalence", where the learner spends the first steps applying random controls (e.g. Gaussian noise), then estimates the dynamics via least-squares, and thereafter optimizes the control-policy using these estimates. Even though this is the simplest exploration strategy, it is optimal for the classical linear quadratic regulator (LQR). In this thesis, I study 1) the problem of adaptive control when the cost can be any convex function (thus generalizing LQR), and 2) the problem of learning a linear dynamical system (LDS) from "censored" observations, i.e., the state is observed only when it falls inside some set (e.g., the state can be the position of an object and the set can be the camera-frame). For the first problem, certainty-equivalence is suboptimal. For the second, the least-squares solution is not even consistent. The contribution of the thesis is the development of new computationally and statistically efficient algorithms for both problems. Specifically, 1. For adaptive control with convex costs, we consider the objective of regret with respect to the benchmark of stabilizing linear control policies. Leveraging ideas from convex geometry, we design an exploration strategy that achieves optimal regret, in terms of the dependence on the time-horizon. Our result improves upon the previous best known bounds, which correspond to algorithms applying certainty-equivalence. 2. In the problem of learning an LDS from censored observations, the learner observes the state $x_t \in \R^d$ if and only if $x_t$ belongs to some set $S_t \subseteq \R^d$. This setting was first considered by Lee and Maddala (1985), and Zeger and Brookmeyer (1986). We develop the first computationally and statistically efficient algorithm for learning the system, assuming only membership-oracle access to the sets $S_t$ (which can be arbitrarily complex sets, e.g., non-convex). Our algorithm, Stochastic Online Newton with Switching Gradients, is a novel second-order method that builds on the Online Newton Step of Hazan et al. (2007).
URI:	http://arks.princeton.edu/ark:/99999/fk41z5p59s
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Computer Science

Files in This Item:

File	Size	Format
Plevrakis_princeton_0181D_13818.pdf	1.37 MB	Adobe PDF	View/Download

Show full item record

Search

Browse