Lookahead Approximations for Online Learning with Nonlinear Parametric Belief Models

Han, Weidong

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01rb68xf73s

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Powell, Warren B	-
dc.contributor.author	Han, Weidong	-
dc.contributor.other	Operations Research and Financial Engineering Department	-
dc.date.accessioned	2019-11-05T16:49:26Z	-
dc.date.available	2019-11-05T16:49:26Z	-
dc.date.issued	2019	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01rb68xf73s	-
dc.description.abstract	We consider sequential online learning problems where the response surface is described by a nonlinear parametric model. We adopt a sampled belief model which we refer to as a discrete prior. We propose multi-period lookahead policies to overcome the non-concavity in the value of information. For an infinite-horizon problem with discounted cumulative rewards, we prove asymptotic convergence properties under the proposed policies. Forfinite-horizon problem with undiscounted reward, we analyze the proposed policies through empirical studies in three different settings: a health setting where we make medical decisions to maximize health care response over time, a dynamic pricing setting where we make pricing decisions to maximize the cumulative revenue, and a clinical pharmacology setting where we make dosage controls to minimize the deviation between actual and target effects. We also apply the modelling framework to a real world bidding problem in online advertisement auctions, and formulate it into a finite-horizon state-dependent learning problem, where we have to maximize ad-clicks while learning from noisy responses within a budget constraint. We demonstrate that the multi-period lookahead policies perform competitively against other state-of-the-art policies.	-
dc.language.iso	en	-
dc.publisher	Princeton, NJ : Princeton University	-
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>	-
dc.subject	Advertisement auctions	-
dc.subject	Dynamic programming	-
dc.subject	Multi-armed bandits	-
dc.subject	Online learning	-
dc.subject	Optimal learning	-
dc.subject	Value of information	-
dc.subject.classification	Operations research	-
dc.title	Lookahead Approximations for Online Learning with Nonlinear Parametric Belief Models	-
dc.type	Academic dissertations (Ph.D.)	-
Appears in Collections:	Operations Research and Financial Engineering

Files in This Item:

File	Description	Size	Format
Han_princeton_0181D_12996.pdf		2.46 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse