Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp013b591c54v
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorSingh, Mona-
dc.contributor.authorTodd, David-
dc.date.accessioned2020-08-12T13:11:39Z-
dc.date.available2020-08-12T13:11:39Z-
dc.date.created2020-05-03-
dc.date.issued2020-08-12-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp013b591c54v-
dc.description.abstractCharacterizing proteins, which mediate a wide array of cellular processes by binding various ligands, is a major aim of computational biology. While proteins maycontain hundreds of amino acids, often only a few are typically involved in interactions with biologically relevant ligands. The most direct approach to determinewhich amino acid residues within a protein are involved in binding is through experimental methods, but only relatively few proteins have been captured in complex with a relevant ligand. To bridge this gap, we train a bidirectional Long ShortTerm Memory (BiLSTM) model to predict the binding properties of each aminoacid position from sequence-based features for five ligand groups: DNA, RNA, protein, ion, and metabolite. To increase power, we extend our set of true labels beyond the limited experimental data by using protein domain-based inferred binding scores. We then evaluate our model by measuring performance on a held-outtest set, and compare performance to a baseline XGBoost model, as well as an existing method. In both these comparisons, our model performs at least as well orbetter for all ligand groups. Because they reflect the binding potential of individ-ual amino acid sites, our predictions can also provide insight into both healthy anddiseased protein function.en_US
dc.format.mimetypeapplication/pdf-
dc.language.isoenen_US
dc.titleLICENSEen_US
dc.titleLICENSEen_US
dc.titleIdentifying Binding Positions in Proteins Using Neural Networksen_US
dc.typePrinceton University Senior Theses-
pu.date.classyear2020en_US
pu.departmentComputer Scienceen_US
pu.pdf.coverpageSeniorThesisCoverPage-
pu.contributor.authorid961272339-
Appears in Collections:Computer Science, 1988-2020

Files in This Item:
File Description SizeFormat 
TODD-DAVID-THESIS.pdf885.86 kBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.