Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01765373783
Title: | Predicting DNA Recognition by Cys2His2 Zinc Finger Proteins With Random Forests |
Authors: | On, Brian |
Advisors: | Singh, Mona |
Department: | Computer Science |
Class Year: | 2016 |
Abstract: | Cys2His2 zinc finger proteins comprise the largest transcription factor family in eukaryotic genomes. Prediction of their DNA-binding specificities would allow for both the design of chimeric proteins able to target specific regions in the genome as well as extraction of information about the regulatory/cellular networks. While early prediction methods for the DNA binding of Cys2His2 zinc fingers focused heavily on probabilistic or quantitative models, recent successful SVM, random forest, and neural network approaches have demonstrated the efficacy of machine learning algorithms toward DNA binding prediction. We continue to explore the application of machine learning to tackle the problem, leveraging a recently compiled, expansive dataset of Cys2His2 zinc finger protein-DNA interactions and the ensemble random forest technique. We test our approach on a set of naturally occurring proteins with experimentally determined binding specificities, and find that our algorithm is competitive with previously published state-of-the-art prediction methods. Overall, our random forest model is able to predict at least half the columns of experimental PWMs for over 80% of the naturally occurring proteins. |
Extent: | 55 pages |
URI: | http://arks.princeton.edu/ark:/88435/dsp01765373783 |
Type of Material: | Princeton University Senior Theses |
Language: | en_US |
Appears in Collections: | Computer Science, 1988-2020 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
On_Brian_2016_Thesis.pdf | 2.52 MB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.