Overcoming Sampling and Exploration Challenges in Deep Reinforcement Learning

Simmons-Edler, Riley

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk4fn2pn1w

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Seung, Sebastian H
dc.contributor.author	Simmons-Edler, Riley
dc.contributor.other	Computer Science Department
dc.date.accessioned	2022-06-15T15:16:46Z	-
dc.date.available	2022-06-15T15:16:46Z	-
dc.date.created	2022-01-01
dc.date.issued	2022
dc.identifier.uri	http://arks.princeton.edu/ark:/99999/fk4fn2pn1w	-
dc.description.abstract	The combination of deep neural networks with reinforcement learning (RL)shows great promise for solving otherwise intractable learning tasks. However, practical demonstrations of deep reinforcement learning remain scarce. The challenges in using deep RL for a given task can be grouped into two categories, broadly “What to learn from experience?” and “What experience to learn from?” In this thesis, I describe work to address the second category. Specifically, prob- lems of sampling actions, states, and trajectories which contain information rel- evant to learning tasks. I examine this challenge at three levels of algorithm de- sign and task complexity, from algorithmic components to hybrid combination algorithms that break common RL conventions. In the first chapter, I describe work on stable and eﬀicient sampling of ac- tions that optimize a Q-function of continuous-valued actions. By combining a sample-based optimizer with neural network approximation, it is possible to obtain stability in training, computational eﬀiciency, and precise inference. In the second chapter, I describe work on reward-aware exploration, the dis- covery of desirable behaviors where common sampling methods are insuﬀicient. A teacher “exploration” agent discovers states and trajectories which maximize the amount a student “exploitation” agent learns on those experiences, and can enable the student agent to solve hard tasks which are otherwise impossible. In the third chapter, I describe work combining reinforcement learning with heuristic search, for use in task domains where the transition model is known, but where the combinatorics of the state space are intractable for traditional search. By combining deep Q-learning with a best-first tree search algorithm, it is possible to find solutions to program synthesis problems with dramatically fewer samples than with common search algorithms or RL alone. Lastly, I conclude with a summary of the major takeaways of this work, and discuss extensions and future directions for eﬀicient sampling in RL.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.publisher	Princeton, NJ : Princeton University
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu>catalog.princeton.edu</a>
dc.subject	Deep Learning
dc.subject	Deep Reinforcement Learning
dc.subject	Machine Learning
dc.subject	Reinforcement Learning
dc.subject.classification	Artificial intelligence
dc.title	Overcoming Sampling and Exploration Challenges in Deep Reinforcement Learning
dc.type	Academic dissertations (Ph.D.)
pu.date.classyear	2022
pu.department	Computer Science
Appears in Collections:	Computer Science

Files in This Item:

File	Size	Format
SimmonsEdler_princeton_0181D_14106.pdf	6.17 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse