Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp018049g803m
Title: TEXT
A Quantitative Summary Statistic for Genetic Admixture
TEXT
Authors: Sultana, Mayisha Mahdiya
Advisors: Storey, John D.
Department: Molecular Biology
Certificate Program: Center for Statistics and Machine Learning
Class Year: 2020
Abstract: The admixture model is a widely popular approach to evaluate the genetic ancestry of humans and other organisms. The model has successfully been used to improve the accuracy of genetic association studies, to further the understanding of human migratory history, and to help identify signatures of natural selection. Admixture occurs when individuals of two genetically divergent populations interbreed. The admixture model, assuming that each observed individual is derived from d ancestral populations, estimates (a) the allele frequencies that define the ancestral populations, and (b) the proportions of each individual's genetic information that comes from each ancestral population. The standard summary tool for the results of ancestry estimation has become the admixture barplot, a stacked barplot illustrating admixture proportions across individuals. In the genetic literature, these barplots are used to compare the ancestry profiles of distinct populations and even to inform the reconstruction of ancestral histories. Unfortunately, such usage can be extremely misleading, because there is no concrete metric of similarity when using a qualitative summary. It is difficult to know the error associated with ancestry estimates, and two similar-looking barplots may come from datasets that represent individuals with very different ancestry. Therefore, we need a tool that can summarize subtle differences in the underlying distribution of admixture. This thesis calls attention to the need for a quantitative summary statistic for admixture that is concise, informative about the error in the estimates, and allows a comparison of ancestry across datasets. Here, we evaluate the two most common methods of obtaining summary statistics in the field of statistics: maximum likelihood estimation and method of moments. However, these methods fail to achieve a high level of accuracy. To solve this problem, we propose a new summary method, the Hybrid estimator, and demonstrate that it outperforms the existing methods in accuracy. Rather than replace the existing tool, the goal of this thesis is to encourage the use of this summary alongside the admixture bar plot. This will provide a more robust analysis of ancestry.
URI: http://arks.princeton.edu/ark:/88435/dsp018049g803m
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Molecular Biology, 1954-2020

Files in This Item:
File Description SizeFormat 
SULTANA-MAYISHAMAHDIYA-THESIS.pdf4.01 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.