Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/99999/fk4g17fh5g
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Mulvey, John | |
dc.contributor.author | Lu, Kun | |
dc.contributor.other | Operations Research and Financial Engineering Department | |
dc.date.accessioned | 2021-06-10T17:14:15Z | - |
dc.date.available | 2021-06-10T17:14:15Z | - |
dc.date.issued | 2021 | |
dc.identifier.uri | http://arks.princeton.edu/ark:/99999/fk4g17fh5g | - |
dc.description.abstract | This dissertation focus on developing new statistical and machine learning methods for financial applications. We first propose a new model named Features Augmented Hidden Markov Model (FAHMM), which extends the the traditional Hidden Markov Model (HMM) by including the features structure. We also allow the model to be very general from two perspectives: 1. the emission distribution can be of different form (eg. exponential family); 2. we also deal with different features structures (e.g. high dimensionality, multi-colinearity) by adding different penalization terms. Theoretical proof of convergence, simulation and an empirical application to currency regime identification are provided. Next, we develop a new neural Natural Language Processing model, which combines the reinforcement learning model with Bidirectional Encoder Representations from Transformers (BERT) model to deal with the long documents classification. Due to the limitation of BERT allowing only 512 tokens, it cannot deal with long documents, which is very common in financial data (e.g. financial news, earnings transcript), we train reinforcement learning model together with the BERT based model end to end: using policy-gradient reinforcement learning to do sentences/chunks selection. Then we apply our model to earnings conference call transcripts data and predict the stock price movement after the call. Finally, we work on a method to estimate the high dimensional covariance matrix using high frequency data. We use factor structure and thresholding methods to deal with high dimensionality, and using pre-average and refresh time to tackle high frequency data specialty: microstructure noise and non-synchronicity. We also consider three different scenarios, when we only know factors, or only know loadings, or know neither. Theoretical proof and simulation are provided to support the theory, and a horse race on the out-of-sample portfolio allocation with Dow Jones 30, S&P 100, and S&P 500 index constituents, respectively are also conducted. | |
dc.language.iso | en | |
dc.publisher | Princeton, NJ : Princeton University | |
dc.relation.isformatof | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a> | |
dc.subject.classification | Statistics | |
dc.title | Statistical and Machine Learning Methods For Financial Data | |
dc.type | Academic dissertations (Ph.D.) | |
Appears in Collections: | Operations Research and Financial Engineering |
Files in This Item:
File | Size | Format | |
---|---|---|---|
Lu_princeton_0181D_13623.pdf | 3.39 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.