Statistical and Machine Learning Methods For Financial Data

Lu, Kun

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk4g17fh5g

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Mulvey, John
dc.contributor.author	Lu, Kun
dc.contributor.other	Operations Research and Financial Engineering Department
dc.date.accessioned	2021-06-10T17:14:15Z	-
dc.date.available	2021-06-10T17:14:15Z	-
dc.date.issued	2021
dc.identifier.uri	http://arks.princeton.edu/ark:/99999/fk4g17fh5g	-
dc.description.abstract	This dissertation focus on developing new statistical and machine learning methods for financial applications. We first propose a new model named Features Augmented Hidden Markov Model (FAHMM), which extends the the traditional Hidden Markov Model (HMM) by including the features structure. We also allow the model to be very general from two perspectives: 1. the emission distribution can be of different form (eg. exponential family); 2. we also deal with different features structures (e.g. high dimensionality, multi-colinearity) by adding different penalization terms. Theoretical proof of convergence, simulation and an empirical application to currency regime identification are provided. Next, we develop a new neural Natural Language Processing model, which combines the reinforcement learning model with Bidirectional Encoder Representations from Transformers (BERT) model to deal with the long documents classification. Due to the limitation of BERT allowing only 512 tokens, it cannot deal with long documents, which is very common in financial data (e.g. financial news, earnings transcript), we train reinforcement learning model together with the BERT based model end to end: using policy-gradient reinforcement learning to do sentences/chunks selection. Then we apply our model to earnings conference call transcripts data and predict the stock price movement after the call. Finally, we work on a method to estimate the high dimensional covariance matrix using high frequency data. We use factor structure and thresholding methods to deal with high dimensionality, and using pre-average and refresh time to tackle high frequency data specialty: microstructure noise and non-synchronicity. We also consider three different scenarios, when we only know factors, or only know loadings, or know neither. Theoretical proof and simulation are provided to support the theory, and a horse race on the out-of-sample portfolio allocation with Dow Jones 30, S&P 100, and S&P 500 index constituents, respectively are also conducted.
dc.language.iso	en
dc.publisher	Princeton, NJ : Princeton University
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>
dc.subject.classification	Statistics
dc.title	Statistical and Machine Learning Methods For Financial Data
dc.type	Academic dissertations (Ph.D.)
Appears in Collections:	Operations Research and Financial Engineering

Files in This Item:

File	Size	Format
Lu_princeton_0181D_13623.pdf	3.39 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse