On the Complexity of Markov Decision Problems

Chen, Yichen

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01rf55zb61z

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Wang, Mengdi	-
dc.contributor.author	Chen, Yichen	-
dc.contributor.other	Computer Science Department	-
dc.date.accessioned	2020-07-13T03:32:06Z	-
dc.date.available	2020-07-13T03:32:06Z	-
dc.date.issued	2020	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01rf55zb61z	-
dc.description.abstract	The Markov decision problem (MDP) is one of the most basic models for sequential decision-making problems in a dynamic environment where outcomes are partly random. It models a stochastic control process in which a planner makes a sequence of decisions as the system evolves. The Markov decision problem provides a mathematical framework for dynamic programming, stochastic control, and reinforcement learning. In this thesis, we study the complexity of solving MDPs. In the first part of the thesis, we propose a class of stochastic primal-dual methods for solving MDPs. We formulate the core policy optimization problem of the MDP as a stochastic saddle point problem. By utilizing the value-policy duality structure, the algorithm samples state-action transitions and makes alternative updates to value and policy estimates. We prove that our algorithm is able to find the approximately optimal policies to Markov decision problems with small space and computational complexity. Using linear models to represent the value functions and the policies, our algorithm is capable of scaling to problems with infinite and continuous state spaces. In the second part of the thesis, we establish the computational complexity lower bounds for solving MDPs. We prove our results by modeling the MDP algorithms using branching programs and then characterizing the properties of these programs by quantum arguments. The analysis is also extended to the study of the complexity of solving two-player turn-based Markov games. Our results show that if we have a simulator that can sample according to the transition probability function in $O(1)$, the lower bounds have reduced dependence on the state number. These results suggest a fundamental difference between Markov decision problems with and without a simulator. We believe that our results provide a new piece of theoretical evidence for the success of simulation-based methods in solving MDPs and Markov games.	-
dc.language.iso	en	-
dc.publisher	Princeton, NJ : Princeton University	-
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>	-
dc.subject	Markov decision processes	-
dc.subject	Optimization	-
dc.subject	Reinforcement learning	-
dc.subject.classification	Computer science	-
dc.title	On the Complexity of Markov Decision Problems	-
dc.type	Academic dissertations (Ph.D.)	-
Appears in Collections:	Computer Science

Files in This Item:

File	Description	Size	Format
Chen_princeton_0181D_13295.pdf		1.3 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse