Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01rb68xf73s
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorPowell, Warren B-
dc.contributor.authorHan, Weidong-
dc.contributor.otherOperations Research and Financial Engineering Department-
dc.date.accessioned2019-11-05T16:49:26Z-
dc.date.available2019-11-05T16:49:26Z-
dc.date.issued2019-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01rb68xf73s-
dc.description.abstractWe consider sequential online learning problems where the response surface is described by a nonlinear parametric model. We adopt a sampled belief model which we refer to as a discrete prior. We propose multi-period lookahead policies to overcome the non-concavity in the value of information. For an infinite-horizon problem with discounted cumulative rewards, we prove asymptotic convergence properties under the proposed policies. Forfinite-horizon problem with undiscounted reward, we analyze the proposed policies through empirical studies in three different settings: a health setting where we make medical decisions to maximize health care response over time, a dynamic pricing setting where we make pricing decisions to maximize the cumulative revenue, and a clinical pharmacology setting where we make dosage controls to minimize the deviation between actual and target effects. We also apply the modelling framework to a real world bidding problem in online advertisement auctions, and formulate it into a finite-horizon state-dependent learning problem, where we have to maximize ad-clicks while learning from noisy responses within a budget constraint. We demonstrate that the multi-period lookahead policies perform competitively against other state-of-the-art policies.-
dc.language.isoen-
dc.publisherPrinceton, NJ : Princeton University-
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>-
dc.subjectAdvertisement auctions-
dc.subjectDynamic programming-
dc.subjectMulti-armed bandits-
dc.subjectOnline learning-
dc.subjectOptimal learning-
dc.subjectValue of information-
dc.subject.classificationOperations research-
dc.titleLookahead Approximations for Online Learning with Nonlinear Parametric Belief Models-
dc.typeAcademic dissertations (Ph.D.)-
Appears in Collections:Operations Research and Financial Engineering

Files in This Item:
File Description SizeFormat 
Han_princeton_0181D_12996.pdf2.46 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.