Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01rb68xf73s
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Powell, Warren B | - |
dc.contributor.author | Han, Weidong | - |
dc.contributor.other | Operations Research and Financial Engineering Department | - |
dc.date.accessioned | 2019-11-05T16:49:26Z | - |
dc.date.available | 2019-11-05T16:49:26Z | - |
dc.date.issued | 2019 | - |
dc.identifier.uri | http://arks.princeton.edu/ark:/88435/dsp01rb68xf73s | - |
dc.description.abstract | We consider sequential online learning problems where the response surface is described by a nonlinear parametric model. We adopt a sampled belief model which we refer to as a discrete prior. We propose multi-period lookahead policies to overcome the non-concavity in the value of information. For an infinite-horizon problem with discounted cumulative rewards, we prove asymptotic convergence properties under the proposed policies. Forfinite-horizon problem with undiscounted reward, we analyze the proposed policies through empirical studies in three different settings: a health setting where we make medical decisions to maximize health care response over time, a dynamic pricing setting where we make pricing decisions to maximize the cumulative revenue, and a clinical pharmacology setting where we make dosage controls to minimize the deviation between actual and target effects. We also apply the modelling framework to a real world bidding problem in online advertisement auctions, and formulate it into a finite-horizon state-dependent learning problem, where we have to maximize ad-clicks while learning from noisy responses within a budget constraint. We demonstrate that the multi-period lookahead policies perform competitively against other state-of-the-art policies. | - |
dc.language.iso | en | - |
dc.publisher | Princeton, NJ : Princeton University | - |
dc.relation.isformatof | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a> | - |
dc.subject | Advertisement auctions | - |
dc.subject | Dynamic programming | - |
dc.subject | Multi-armed bandits | - |
dc.subject | Online learning | - |
dc.subject | Optimal learning | - |
dc.subject | Value of information | - |
dc.subject.classification | Operations research | - |
dc.title | Lookahead Approximations for Online Learning with Nonlinear Parametric Belief Models | - |
dc.type | Academic dissertations (Ph.D.) | - |
Appears in Collections: | Operations Research and Financial Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Han_princeton_0181D_12996.pdf | 2.46 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.