Data-Driven Approaches and Systems to Interrogate Complex Disease

Dannenfelser, Ruth

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp018c97kt354

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Troyanskaya, Olga G	-
dc.contributor.author	Dannenfelser, Ruth	-
dc.contributor.other	Computer Science Department	-
dc.date.accessioned	2020-07-13T03:32:00Z	-
dc.date.issued	2020	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp018c97kt354	-
dc.description.abstract	Large-scale genomic studies now give more predictive power than ever, allowing us to profile the composition of tissues, study cellular functions, and understand organismal traits at an unprecedented level of detail. This is particularly important for studying heterogeneous diseases, such as cancer, where small patient-specific differences play critical roles in disease development and progression. As the these studies accumulate, it is becoming increasingly important to develop methods to both discover novel biology while considering tissue and cell type specificity, and develop systems to help make this data explosion easily manageable, accessible, and interpretable. Towards these goals, in this dissertation, we build off the wealth of publicly available data to examine the interplay between cancer and the immune system and then develop two query-based visualization systems that enable interactive data exploration for the biomedical community at large. The first part of this work will present two perspectives on cancer and the immune system, starting with a semi-supervised approach for immune cell type quantification in chapter 2. Using derived immune markers we examined lymphocyte infiltration in breast cancer and found that estrogen receptor activity and genomic complexity are the key factors driving variation in lymphocytic infiltrate across individuals. Our method allowed us to make these discoveries on existing samples even when this was not the original intent of the study, without the need for additional experiments. In a broader scope, in chapter 3, we leveraged public expression data to further the development of targeted immunotherapeutics for solid tumors. Engineered T cell therapies have shown great promise for hematological cancers but have only found limited success in targeting solid tumors due to off target effects. Working closely with experimental collaborators we developed a method to prioritize pairs of antigen targets that will help engineered T cells hone in on tumor targets while minimizing damage to healthy tissues. Notably, we were able to narrow down the space of more than 2.7 million potential pairs, to a few hundred top candidates per tumor type, and find new transmembrane proteins with therapeutic potential, effectively speeding up the development of novel immunotherapeutics for solid tumors. The fourth and fifth chapters will cover how we can extract unbiased signals from large collections of biomedical data in the form of abstracts and repositories of transcriptomics data. First in chapter 4, we show how we can obtain informative tissue-disease-gene relationships from abstracts and integrate them into a system that presents different snapshots of curated interactions and adds tissue and disease annotations to gene lists from experimental assays (e.g., GWAS, differentially expressed genes, drug screens, etc). Secondly, in chapter 5, we extend SEEK, a gene expression search engine that simultaneously returns coexpressed genes and relevant datasets where query genes are likely coregulated. Our extension expands the search space across the major model organisms and provides a new cross-organism exploration interface to help facilitate translational research. Both systems will help experimentalists leverage existing knowledge to better explain the larger implications of their specific findings, without requiring additional computational expertise.	-
dc.language.iso	en	-
dc.publisher	Princeton, NJ : Princeton University	-
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>	-
dc.subject	cancer	-
dc.subject	computational biology	-
dc.subject	gene expression	-
dc.subject	immunotherapeutics	-
dc.subject	machine learning	-
dc.subject	text mining	-
dc.subject.classification	Computer science	-
dc.title	Data-Driven Approaches and Systems to Interrogate Complex Disease	-
dc.type	Academic dissertations (Ph.D.)	-
pu.embargo.lift	2022-06-26	-
pu.embargo.terms	2022-06-26	-
Appears in Collections:	Computer Science

Files in This Item:

This content is embargoed until 2022-06-26. For questions about theses and dissertations, please contact the Mudd Manuscript Library. For questions about research datasets, as well as other inquiries, please contact the DataSpace curators.

Show simple item record

Search

Browse