Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/99999/fk48359t3v
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Jin, Chi | |
dc.contributor.author | Mir Yoosefi, Seyed Sobhan | |
dc.contributor.other | Computer Science Department | |
dc.date.accessioned | 2022-06-15T15:16:12Z | - |
dc.date.available | 2022-06-15T15:16:12Z | - |
dc.date.created | 2022-01-01 | |
dc.date.issued | 2022 | |
dc.identifier.uri | http://arks.princeton.edu/ark:/99999/fk48359t3v | - |
dc.description.abstract | Reinforcement learning has gained a surge of interest over the past years, fueled mainly by practical success and new applications in various domains. However, there is still a gap between our theoretical understanding of these RL techniques and their empirical success. In this thesis, we advance our understanding by studying reinforcement learning from a primarily theoretical point of view and designing provably efficient algorithms for two challenging settings of 1) RL with constraints and 2) RL with function approximation. 1) In standard RL, a learning agent seeks to optimize the overall reward. However, many key aspects of the desired behavior are more naturally expressed as constraints. First, we propose an algorithmic scheme that can handle RL tasks with general convex constraints improving upon prior works that are either limited to linear constraints or lack theoretical guarantees. Second, focusing on sample-efficient exploration, we develop the first provably efficient algorithm for tabular episodic constrained RL with the ability to handle convex constraints as well as the knapsack setting. Finally, motivated by recent advances in reward-free RL, we propose a simple meta-algorithm such that given any reward-free RL oracle, the constrained RL problems can be directly solved with negligible overheads in sample complexity. 2) Finding the minimal structural assumptions that empower sample-efficient learning is one of RL's most important research directions. This thesis advances our understanding of this fundamental question by introducing a new complexity measure---Bellman Eluder (BE) dimension. We show that the family of RL problems with low BE dimension is remarkably rich, which subsumes a vast majority of existing tractable RL problems. We further design a new optimization-based algorithm—GOLF, and provide regret and sample complexity results matching or improving the best existing results for several well-known subclasses of low BE dimension problems. Furthermore, moving towards a more challenging setting of partially observable RL, we study a new subclass of Partially Observable Markov Decision Processes (POMDPs) whose latent states can be decoded by the most recent history of a short length m. Our results show that a short-term memory suffices for reinforcement learning in these environments. | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.publisher | Princeton, NJ : Princeton University | |
dc.relation.isformatof | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu>catalog.princeton.edu</a> | |
dc.subject | Reinforcement Learning | |
dc.subject.classification | Computer science | |
dc.title | Provable Reinforcement Learning with Constraints and Function Approximation | |
dc.type | Academic dissertations (Ph.D.) | |
pu.date.classyear | 2022 | |
pu.department | Computer Science | |
Appears in Collections: | Computer Science |
Files in This Item:
File | Size | Format | |
---|---|---|---|
MirYoosefi_princeton_0181D_14091.pdf | 4.05 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.