Provable Reinforcement Learning with Constraints and Function Approximation

Mir Yoosefi, Seyed Sobhan

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk48359t3v

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Jin, Chi
dc.contributor.author	Mir Yoosefi, Seyed Sobhan
dc.contributor.other	Computer Science Department
dc.date.accessioned	2022-06-15T15:16:12Z	-
dc.date.available	2022-06-15T15:16:12Z	-
dc.date.created	2022-01-01
dc.date.issued	2022
dc.identifier.uri	http://arks.princeton.edu/ark:/99999/fk48359t3v	-
dc.description.abstract	Reinforcement learning has gained a surge of interest over the past years, fueled mainly by practical success and new applications in various domains. However, there is still a gap between our theoretical understanding of these RL techniques and their empirical success. In this thesis, we advance our understanding by studying reinforcement learning from a primarily theoretical point of view and designing provably efficient algorithms for two challenging settings of 1) RL with constraints and 2) RL with function approximation. 1) In standard RL, a learning agent seeks to optimize the overall reward. However, many key aspects of the desired behavior are more naturally expressed as constraints. First, we propose an algorithmic scheme that can handle RL tasks with general convex constraints improving upon prior works that are either limited to linear constraints or lack theoretical guarantees. Second, focusing on sample-efficient exploration, we develop the first provably efficient algorithm for tabular episodic constrained RL with the ability to handle convex constraints as well as the knapsack setting. Finally, motivated by recent advances in reward-free RL, we propose a simple meta-algorithm such that given any reward-free RL oracle, the constrained RL problems can be directly solved with negligible overheads in sample complexity. 2) Finding the minimal structural assumptions that empower sample-efficient learning is one of RL's most important research directions. This thesis advances our understanding of this fundamental question by introducing a new complexity measure---Bellman Eluder (BE) dimension. We show that the family of RL problems with low BE dimension is remarkably rich, which subsumes a vast majority of existing tractable RL problems. We further design a new optimization-based algorithm—GOLF, and provide regret and sample complexity results matching or improving the best existing results for several well-known subclasses of low BE dimension problems. Furthermore, moving towards a more challenging setting of partially observable RL, we study a new subclass of Partially Observable Markov Decision Processes (POMDPs) whose latent states can be decoded by the most recent history of a short length m. Our results show that a short-term memory suffices for reinforcement learning in these environments.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.publisher	Princeton, NJ : Princeton University
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu>catalog.princeton.edu</a>
dc.subject	Reinforcement Learning
dc.subject.classification	Computer science
dc.title	Provable Reinforcement Learning with Constraints and Function Approximation
dc.type	Academic dissertations (Ph.D.)
pu.date.classyear	2022
pu.department	Computer Science
Appears in Collections:	Computer Science

Files in This Item:

File	Size	Format
MirYoosefi_princeton_0181D_14091.pdf	4.05 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse