Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01f7623g02z
Full metadata record
DC FieldValueLanguage
dc.contributorCuff, Paul-
dc.contributor.advisorAbbe, Emmanuel-
dc.contributor.authorLuo, Mo-
dc.date.accessioned2016-06-23T14:07:29Z-
dc.date.available2016-06-23T14:07:29Z-
dc.date.created2016-05-02-
dc.date.issued2016-06-23-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01f7623g02z-
dc.description.abstractThe Poisson model community detection algorithm [1] is applied to Hi-C contact data. Hi-C contact data is preprocessed in various ways in attempt to identify communities at various distances relative to each other. We nd the following about location of communities within chromosomes in subsection of human chromosome 14 in cell line GM06990: 1) we are able to detect communities that are relatively close in the 1-D strand of DNA, where are large number of interactions exist between nodes 2) we are also able to detect communities of nodes that are separated by a larger distance 3) the vast majority and most dominant communities are among nodes in 1-D proximity. Without preprocessing our data, we nd k=6 communities, the same number as Cabreros et. al. [3]. By eliminating low contact interactions, the number of communities drops to 5. By eliminating interactions along the main diagonal (1-D proximity), we detect 3 communities. Additionally, we veri ed that similar behavior is mostly observable when applying the same techniques to mouse chromosomes 1-5. We do nd, however, that changing the restriction enzyme used to create the Hi-C data can substantively a ect clustering results. This could be because any variability in the main band can greatly skew the clustering. Most notably, however, removing the main diagonal band, up to a certain point, actually makes detecting more communities possible. Finally, we adapted the adjusted mutual information score to compare our clustering results and nd that while clustering results on the preprocessed data seem relatively similar even though the preprocessing techniques removed opposite nodes. The results from unpreprocessed data, however, has a low adjusted mutual information score with all other clustering results. While the results produced by CD are for the most part intuitive, some are di cult to explain and may require new theories and understandings about the 3D structure of DNA.en_US
dc.format.extent54 pages*
dc.language.isoen_USen_US
dc.titleInference of DNA Community Interactions in Hi-C Contact Dataen_US
dc.typePrinceton University Senior Theses-
pu.date.classyear2016en_US
pu.departmentElectrical Engineeringen_US
pu.pdf.coverpageSeniorThesisCoverPage-
Appears in Collections:Electrical Engineering, 1932-2020

Files in This Item:
File SizeFormat 
Luo_Mo_seniorthesis.pdf1.26 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.