Primal-Dual Method for Reinforcement Learning and Markov Decision Processes

Gong, Hao

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk43n3mm20

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Wang, Mengdi
dc.contributor.author	Gong, Hao
dc.contributor.other	Operations Research and Financial Engineering Department
dc.date.accessioned	2021-06-10T17:14:37Z	-
dc.date.available	2021-06-10T17:14:37Z	-
dc.date.issued	2021
dc.identifier.uri	http://arks.princeton.edu/ark:/99999/fk43n3mm20	-
dc.description.abstract	In this thesis, we study the application of a class of stochastic primal-dual algorithms to reinforcement learning. The method takes advantage of the min-max formulation of the Bellman equation and jointly learns the optimal value function and the optimal policy through a series of primal-dual updates. We study both the online reinforcement learning problem and the simulation problem, and present several variants of the method and their theoretical guarantees. In Chapter 2, we investigate the Infinite-Horizon Average-Reward Markov Decision Process (AMDP) and propose an online primal-dual algorithm to learn the optimal policy by actively making decisions along a single state trajectory. Our results appear to be the first duality-based value-policy gradient method and regret analysis for infinite-horizon RL. We prove an $\mathcal{O}(\sqrt{T})$ regret upper bound that also depends on an ergodicity parameter of the AMDP, which can be significantly smaller than existing diameter-dependent bounds in some cases. In Chapter 3, we propose a class of utility maximization problems as a generalization of AMDPs that incorporates nonlinearity into the objective. We develop a primal-dual algorithm to solve the proposed problems and show that it achieves superior sample complexity compared to previous duality-based algorithms when the problem reduces to an AMDP. Chapter 4 presents a general primal-dual framework for solving the utility maximization problem with function approximation. We also study a specific class of linear/bilinear models and prove that the adjusted algorithm achieves a comparable convergence rate.
dc.language.iso	en
dc.publisher	Princeton, NJ : Princeton University
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>
dc.subject	Machine Learning
dc.subject	Markov Decision Process
dc.subject	Primal-Dual Method
dc.subject	Reinforcement Learning
dc.subject.classification	Operations research
dc.subject.classification	Computer science
dc.title	Primal-Dual Method for Reinforcement Learning and Markov Decision Processes
dc.type	Academic dissertations (Ph.D.)
Appears in Collections:	Operations Research and Financial Engineering

Files in This Item:

File	Size	Format
Gong_princeton_0181D_13650.pdf	890.44 kB	Adobe PDF	View/Download

Show simple item record

Search

Browse