Prediction of Cancer Phenotypes Through Machine Learning Approaches: From Gene Modularity to Deep Neural Networks

Zamalloa, Jose  Antonio

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp018c97kt22j

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Singh, Mona	-
dc.contributor.author	Zamalloa, Jose Antonio	-
dc.contributor.other	Quantitative Computational Biology Department	-
dc.date.accessioned	2019-04-30T17:53:01Z	-
dc.date.available	2019-04-30T17:53:01Z	-
dc.date.issued	2019	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp018c97kt22j	-
dc.description.abstract	The current genomics data influx is transforming healthcare by enabling precise diagnoses and individualized treatments. This is especially true for cancer, where we have genome sequencing and gene expression data across numerous individuals, along with measurements of drug response across hundreds of cancer cell lines. Computa- tional, statistical and machine learning methods play an essential role in analyzing these data in order to gain medically relevant insights. In this dissertation, I describe statistical and machine learning approaches to enable better stratification of cancer subtypes and predict therapy outcomes for individuals with cancer. First, I introduce Deep Pharmacogenomic Modules (Deep-PGMs), a framework to predict drug response outcomes for tumor samples using drug features and gene expression data. Genome expression signatures are a great aid for predicting whether a particular therapy may be beneficial for a specific cancer tumor. Traditional ma- chine learning approaches to predict the effect of a cancer drug on a tumor typically focus on the expression levels of either certain key cancer-relevant genes or of all genes. While genomic data can aid in describing the disease state of an individual by looking at isolated gene entities, genes in cells tend to act in concert to perform their functions. My approach takes advantage of the modular nature of gene regu- lation to build a reduced feature space that describes the cellular state of a tumor. I take advantage of unsupervised machine learning methods to build genomic and non-genomic feature spaces. I construct a deep neural network pipeline to predict drug efficacy outcomes on tumor cell line samples. I demonstrate that my framework outperforms traditional machine learning approaches that do not take advantage of the modular structure of gene expression data sets. I further apply my method to clinical trial data and demonstrate its performance. I find that featurizing genomic data through prior knowledge about cellulary modularity, accompanied with a robust deep learning pipeline, is a powerful method for predicting the disease outcome of novel cancer therapeutics. In the second part of my thesis, I develop classifiers to identify two breast cancer subtypes. First, I describe an accurate Claudin-low (CL) molecular subtype predictor based on gene expression data. This particular subtype has poor prognosis in breast cancer patients. Via experiments in mice along with analysis of human breast cancer data, my collaborators and I linked individuals with CL breast cancer to elevated lev- els of miR-199a. This evidence further supported the high levels of miR-199a in mice tumors and helped characterize the miR-199a-LCOR-IFN axis in tumor initiation. Next, I developed a hysteretic epithelial-mesenchymal transition (EMT) classifier. I use experimental data from TGF-β induced EMT mouse mammary tumor cells to find genes that are indicative of the hysteretic EMT phenotype. The uncovered genes in my model correlate well with metastatic phenotypes in clinical datasets, particularly in patients with metastatic lung cancer, suggesting that EMT-induced mice mammary tumor cells can help elucidate clinically relevant genes important in metastasis.	-
dc.language.iso	en	-
dc.publisher	Princeton, NJ : Princeton University	-
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>	-
dc.subject	Cancer	-
dc.subject	Deep Learning	-
dc.subject	Drug	-
dc.subject	Features	-
dc.subject	Genomics	-
dc.subject	Prediction	-
dc.subject.classification	Bioinformatics	-
dc.title	Prediction of Cancer Phenotypes Through Machine Learning Approaches: From Gene Modularity to Deep Neural Networks	-
dc.type	Academic dissertations (Ph.D.)	-
Appears in Collections:	Quantitative Computational Biology

Files in This Item:

File	Description	Size	Format
Zamalloa_princeton_0181D_12877.pdf		4.48 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse