Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk43n3mm20
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorWang, Mengdi
dc.contributor.authorGong, Hao
dc.contributor.otherOperations Research and Financial Engineering Department
dc.date.accessioned2021-06-10T17:14:37Z-
dc.date.available2021-06-10T17:14:37Z-
dc.date.issued2021
dc.identifier.urihttp://arks.princeton.edu/ark:/99999/fk43n3mm20-
dc.description.abstractIn this thesis, we study the application of a class of stochastic primal-dual algorithms to reinforcement learning. The method takes advantage of the min-max formulation of the Bellman equation and jointly learns the optimal value function and the optimal policy through a series of primal-dual updates. We study both the online reinforcement learning problem and the simulation problem, and present several variants of the method and their theoretical guarantees. In Chapter 2, we investigate the Infinite-Horizon Average-Reward Markov Decision Process (AMDP) and propose an online primal-dual algorithm to learn the optimal policy by actively making decisions along a single state trajectory. Our results appear to be the first duality-based value-policy gradient method and regret analysis for infinite-horizon RL. We prove an $\mathcal{O}(\sqrt{T})$ regret upper bound that also depends on an ergodicity parameter of the AMDP, which can be significantly smaller than existing diameter-dependent bounds in some cases. In Chapter 3, we propose a class of utility maximization problems as a generalization of AMDPs that incorporates nonlinearity into the objective. We develop a primal-dual algorithm to solve the proposed problems and show that it achieves superior sample complexity compared to previous duality-based algorithms when the problem reduces to an AMDP. Chapter 4 presents a general primal-dual framework for solving the utility maximization problem with function approximation. We also study a specific class of linear/bilinear models and prove that the adjusted algorithm achieves a comparable convergence rate.
dc.language.isoen
dc.publisherPrinceton, NJ : Princeton University
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>
dc.subjectMachine Learning
dc.subjectMarkov Decision Process
dc.subjectPrimal-Dual Method
dc.subjectReinforcement Learning
dc.subject.classificationOperations research
dc.subject.classificationComputer science
dc.titlePrimal-Dual Method for Reinforcement Learning and Markov Decision Processes
dc.typeAcademic dissertations (Ph.D.)
Appears in Collections:Operations Research and Financial Engineering

Files in This Item:
File SizeFormat 
Gong_princeton_0181D_13650.pdf890.44 kBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.