Inference of DNA Community Interactions in Hi-C Contact Data

Luo, Mo

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01f7623g02z

Full metadata record

DC Field	Value	Language
dc.contributor	Cuff, Paul	-
dc.contributor.advisor	Abbe, Emmanuel	-
dc.contributor.author	Luo, Mo	-
dc.date.accessioned	2016-06-23T14:07:29Z	-
dc.date.available	2016-06-23T14:07:29Z	-
dc.date.created	2016-05-02	-
dc.date.issued	2016-06-23	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01f7623g02z	-
dc.description.abstract	The Poisson model community detection algorithm [1] is applied to Hi-C contact data. Hi-C contact data is preprocessed in various ways in attempt to identify communities at various distances relative to each other. We nd the following about location of communities within chromosomes in subsection of human chromosome 14 in cell line GM06990: 1) we are able to detect communities that are relatively close in the 1-D strand of DNA, where are large number of interactions exist between nodes 2) we are also able to detect communities of nodes that are separated by a larger distance 3) the vast majority and most dominant communities are among nodes in 1-D proximity. Without preprocessing our data, we nd k=6 communities, the same number as Cabreros et. al. [3]. By eliminating low contact interactions, the number of communities drops to 5. By eliminating interactions along the main diagonal (1-D proximity), we detect 3 communities. Additionally, we veri ed that similar behavior is mostly observable when applying the same techniques to mouse chromosomes 1-5. We do nd, however, that changing the restriction enzyme used to create the Hi-C data can substantively a ect clustering results. This could be because any variability in the main band can greatly skew the clustering. Most notably, however, removing the main diagonal band, up to a certain point, actually makes detecting more communities possible. Finally, we adapted the adjusted mutual information score to compare our clustering results and nd that while clustering results on the preprocessed data seem relatively similar even though the preprocessing techniques removed opposite nodes. The results from unpreprocessed data, however, has a low adjusted mutual information score with all other clustering results. While the results produced by CD are for the most part intuitive, some are di cult to explain and may require new theories and understandings about the 3D structure of DNA.	en_US
dc.format.extent	54 pages	*
dc.language.iso	en_US	en_US
dc.title	Inference of DNA Community Interactions in Hi-C Contact Data	en_US
dc.type	Princeton University Senior Theses	-
pu.date.classyear	2016	en_US
pu.department	Electrical Engineering	en_US
pu.pdf.coverpage	SeniorThesisCoverPage	-
Appears in Collections:	Electrical Engineering, 1932-2020

Files in This Item:

File	Size	Format
Luo_Mo_seniorthesis.pdf	1.26 MB	Adobe PDF	Request a copy

Show simple item record

Search

Browse