Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/99999/fk41n9k098
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Storey, John D | |
dc.contributor.author | Bass, Andrew Jay | |
dc.contributor.other | Quantitative Computational Biology Department | |
dc.date.accessioned | 2021-10-04T13:27:26Z | - |
dc.date.available | 2021-10-04T13:27:26Z | - |
dc.date.created | 2021-01-01 | |
dc.date.issued | 2021 | |
dc.identifier.uri | http://arks.princeton.edu/ark:/99999/fk41n9k098 | - |
dc.description.abstract | Recent advancements in sequencing technology have substantially increased the quality and quantity of data in genomics, presenting novel analytical challenges for biological discovery. In particular, foundational ideas developed in statistics over the past century are not easily extended to these high-dimensional datasets. Therefore, creating novel methodologies to analyze this data is a key challenge faced in statistics, and more generally, biology and computational science. Here I focus on building statistical methods for genome-wide analysis that are statistically rigorous, computationally fast, and easy to implement. In particular, I develop four methods that improve statistical inference of high-dimensional biological data. The first focuses on differential expression analysis where I extend the optimal discovery procedure (ODP) to complex study designs and RNA-seq studies. I find that the extended ODP leverages shared biological signal to substantially improve the statistical power compared to other commonly used testing procedures. The second aims to model the functional relationship between sequencing depth and statistical power in RNA-seq differential expression studies. The resulting model, superSeq, accurately predicts the improvement in statistical power when sequencing additional reads in a completed study. Thus superSeq can guide researchers in choosing a sufficient sequencing depth to maximize statistical power while avoiding unnecessary sequencing costs. The third method estimates the posterior distribution of false discovery rate (FDR) quantities, such as local FDRs and q-values, using a Bayesian nonparametric approach. Specifically, I implement an approximation to these posterior distributions that is scalable to genome-wide datasets using variational inference. These estimated posterior distributions are informative in a significance analysis as they capture the uncertainty of FDR quantities in reported results. Finally, I develop a likelihood-based approach to estimating unobserved population structure on the canonical parameter scale. I demonstrate that this framework can flexibly capture arbitrary structure and provide accurate allele frequency estimates while being computationally fast for large population genetic studies. Therefore, this framework is useful for many applications in population genetics, such as accounting for structure in the genome-wide association testing procedure GCATest. Collectively, these four methods address problems typically encountered in a biological analysis and can thus help improve downstream inferences in high-dimensional settings. | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.publisher | Princeton, NJ : Princeton University | |
dc.relation.isformatof | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu>catalog.princeton.edu</a> | |
dc.subject | False discovery rates | |
dc.subject | Latent variable models | |
dc.subject | Optimal discovery procedure | |
dc.subject | Population structure | |
dc.subject | Statistical inference | |
dc.subject.classification | Biostatistics | |
dc.title | High-dimensional methods to model biological signal in genome-wide studies | |
dc.type | Academic dissertations (Ph.D.) | |
pu.date.classyear | 2021 | |
pu.department | Quantitative Computational Biology | |
Appears in Collections: | Quantitative Computational Biology |
Files in This Item:
File | Size | Format | |
---|---|---|---|
Bass_princeton_0181D_13886.pdf | 11.37 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.