Learning and exploiting hidden structures in genomic data with machine learning and statistical approaches

Zhou, Jian

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp018p58pg604

Title:	Learning and exploiting hidden structures in genomic data with machine learning and statistical approaches
Authors:	Zhou, Jian
Advisors:	Troyanskaya, Olga G
Contributors:	Quantitative Computational Biology Department
Subjects:	Bioinformatics
Issue Date:	2017
Publisher:	Princeton, NJ : Princeton University
Abstract:	The demand for understanding complex systems in biology in order to address challenges in basic science and medicine, as well as the rapidly developing experimental techniques that deliver measurements with both high dimensionality and high sample size, are calling for approaches that can capture complex dependencies among interacting components and integrate information globally, with the goal of complementing the understanding of biological systems from the conventional test-control experiment design. Here we explore computational solutions for specific problems such as understanding the system of genomic DNA and its interacting chromatin proteins, or predicting protein secondary structure. We developed a method that accurately identify direct interactions between chromatin proteins from observed binding profiles. The interaction model inferred also provides a probabilistic landscape of chromatin protein combinatorial patterns that allow furthering understanding of the functional organization of chromatin codes, as represented by chromatin states, on the basis of interactions of the chromatin factors. The chromatin states identified revealed unexpected subtypes of enhancer-like states with new functional associations. For understanding the sequence dependencies of chromatin, we explored deep learning-based approaches for learning complex dependencies in biological sequences. To this end, we developed supervised and convolutional generative stochastic network and applied to protein secondary structure prediction. Moreover, we built model that predict chromatin profiles directly from genomic sequence, and enables predicting effects of noncoding genomic variations on chromatin, which can serve as basis for further understanding of their impacts.
URI:	http://arks.princeton.edu/ark:/88435/dsp018p58pg604
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Quantitative Computational Biology

Files in This Item:

File	Description	Size	Format
Zhou_princeton_0181D_12332.pdf		15.99 MB	Adobe PDF	View/Download

Show full item record

Search

Browse