Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp013b591c54v
Title: LICENSE
LICENSE
Identifying Binding Positions in Proteins Using Neural Networks
Authors: Todd, David
Advisors: Singh, Mona
Department: Computer Science
Class Year: 2020
Abstract: Characterizing proteins, which mediate a wide array of cellular processes by binding various ligands, is a major aim of computational biology. While proteins maycontain hundreds of amino acids, often only a few are typically involved in interactions with biologically relevant ligands. The most direct approach to determinewhich amino acid residues within a protein are involved in binding is through experimental methods, but only relatively few proteins have been captured in complex with a relevant ligand. To bridge this gap, we train a bidirectional Long ShortTerm Memory (BiLSTM) model to predict the binding properties of each aminoacid position from sequence-based features for five ligand groups: DNA, RNA, protein, ion, and metabolite. To increase power, we extend our set of true labels beyond the limited experimental data by using protein domain-based inferred binding scores. We then evaluate our model by measuring performance on a held-outtest set, and compare performance to a baseline XGBoost model, as well as an existing method. In both these comparisons, our model performs at least as well orbetter for all ligand groups. Because they reflect the binding potential of individ-ual amino acid sites, our predictions can also provide insight into both healthy anddiseased protein function.
URI: http://arks.princeton.edu/ark:/88435/dsp013b591c54v
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1988-2020

Files in This Item:
File Description SizeFormat 
TODD-DAVID-THESIS.pdf885.86 kBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.