Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/99999/fk41n9jt18
Title: | Four Essays on Political Methodology |
Authors: | Liu, Naijia |
Advisors: | Londregan, John JBL |
Contributors: | Politics Department |
Subjects: | Political science Statistics |
Issue Date: | 2021 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | The dissertation consists of four essays in political methodology, covering two important issues in the field: missing data imputation and text analysis. The first chapter of this dissertation extends missing data imputation to missing not at random (MNAR).Missing at random (MAR) is a more restrictive assumption than MNAR. However, missing not at random (MNAR) scenario is very plausible in social science datasets, such as missingness in sensitive survey questions. This chapter confronts MNAR by modeling the latent structure of the missingness to mitigate the influence of the unmeasured confounders that cause the missing values. This approach allows one to assume missing at random (MAR) conditional on the latent factor. The proposed method outperforms multiple imputation methods under MNAR. %The wide range of latent factor model enables scholar to tailor it to the dataset and the end goal of the analysis. In addition to simulation comparison, I show an application using latent factor model to impute the missing values in a self-reported ideology question, which is considered to be a sensitive question in the 2017 Chinese Netizen Survey dataset. I conclude the chapter with discussions of the scope of the method and potential extensions. The second chapter further applies the latent factor approach proposed by previous chapter to an observational causal inference setting. I demonstrate that when pre treatment confounders are missing not at random, existing methods cannot solve the missing data problem. Latent factor approach, under modified ignorability assumption is able to deal with missing confounders in the dataset. In addition to simulation comparison, I show an application using latent factor model to impute the missing values in an observational causal inference study, in which imputation significantly altered the estimate of causal effects. The third chapter takes data imputation problem into the next level - a valid inference.Social science researchers deal with missing values in various datasets. Little attention has been paid to inference post imputation. This chapter proposes a method to achieve valid statistical inference with missing data and a new way to evaluate performance of missing data inference, integrating missing value imputation step and model inference step. The proposed method uses a bias correction term to offset the difference between missing and complete observations. For a parametric regression model, the method relaxes the conventional ``missing at random" assumption and distributional assumptions. Simulation and validation results show the superior performance of the proposed method, as compared with more conventional imputation methods. I conclude the paper with an application using a survey dataset showing a substantive change of model estimation before and after imputation. Finally, the last chapter of this dissertation takes on another important issue of political methodology. Unsupervised text analysis models often are highly parameterized and a principled way of model selection is essential to the study results. This chapter proposes a model selection method to LDA topic model. Despite the popularity of LDA topic model, little instruction is given in terms of model selection. Due to the sparsity of text data, the commonly adopted methods for selecting the topic number often tend to overfit the number of topics in a given corpus. I an alternative method to estimate the number of topics by approximating marginal likelihood of Latent Dirichlet Allocation topic model, under the estimation regime of Gibbs sampling. This method alleviates the overfitting problem by adopting a likelihood ratio style estimator \citep{chib1995marginal}, where the marginal likelihood is penalized by the difference between prior and posterior mean. Later in the chapter, I present simulated comparison results in favor of the marginal likelihood approximation methods, and also an application on the Supreme Court data. I show that improvement on model selection leads to substantive change in study result. I also offer discussion on the relative performances between MCMC and variational methods. |
URI: | http://arks.princeton.edu/ark:/99999/fk41n9jt18 |
Alternate format: | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu |
Type of Material: | Academic dissertations (Ph.D.) |
Language: | en |
Appears in Collections: | Politics |
Files in This Item:
This content is embargoed until 2023-05-24. For questions about theses and dissertations, please contact the Mudd Manuscript Library. For questions about research datasets, as well as other inquiries, please contact the DataSpace curators.
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.