Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/99999/fk4932973r
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Kornhauser, Alain | |
dc.contributor.author | Hervieux-Moore, Zachary Thomas John | |
dc.contributor.other | Operations Research and Financial Engineering Department | |
dc.date.accessioned | 2021-10-04T13:27:24Z | - |
dc.date.available | 2021-10-04T13:27:24Z | - |
dc.date.created | 2021-01-01 | |
dc.date.issued | 2021 | |
dc.identifier.uri | http://arks.princeton.edu/ark:/99999/fk4932973r | - |
dc.description.abstract | When developing reinforcement learning algorithms, the main issues are dealing withlarge state spaces and action spaces. For the most part, the state space complexity problem was solved with the advent of AlphaZero. AlphaZero is able to deal with unfathomably large state spaces by using a combination of neural networks and Monte Carlo tree search (MCTS). However, dealing with large action spaces remain an active area of research. We generalize the AlphaZero algorithm by introducing the GAIL framework andtest a variety of alterations. We find that using Thompson Sampling as a selection procedure during the MCTS could potentially improve upon AlphaZero in two-player zero-sum games. However, AlphaZero is extremely competitive with all variations. We then show the strength of GAIL by applying it to the game of Scrabble whichAlphaZero cannot be applied to due to its extremely large action space. Furthermore, GAIL coupled with the Upper Confidence Bound selection procedure and information set MCTS proves to be state of the art in the game of Scrabble. This also establishes that using information set MCTS can be used with a neural network value estimator in reinforcement learning. Finally, we extend these results to the continuous action space domain by developingROAR. A novel algorithm that drastically lowers the action space complexity by making a finite number of action recommendations based on state context and historical performance via reinforcement learning. We end by successfully training it in a nontrivial robot problem. | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.publisher | Princeton, NJ : Princeton University | |
dc.relation.isformatof | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu>catalog.princeton.edu</a> | |
dc.subject | action space | |
dc.subject | AlphaZero | |
dc.subject | GAIL | |
dc.subject | monte carlo tree search | |
dc.subject | reinforcement learing | |
dc.subject | RORY | |
dc.subject.classification | Computer science | |
dc.title | Modern Reinforcement Learning Techniques to Deal with Large Action Spaces | |
dc.type | Academic dissertations (Ph.D.) | |
pu.date.classyear | 2021 | |
pu.department | Operations Research and Financial Engineering | |
Appears in Collections: | Operations Research and Financial Engineering |
Files in This Item:
File | Size | Format | |
---|---|---|---|
HervieuxMoore_princeton_0181D_13884.pdf | 6.51 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.