INTEGRATIVE MULTI-OMIC DATA ANALYSIS AND SOFTWARE DEVELOPMENT

Thistlethwaite, William

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk4n31k22f

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	TROYANSKAYA, OLGA G.
dc.contributor.author	Thistlethwaite, William
dc.contributor.other	Quantitative Computational Biology Department
dc.date.accessioned	2025-02-11T15:40:09Z	-
dc.date.available	2025-02-11T15:40:09Z	-
dc.date.created	2024-01-01
dc.date.issued	2025
dc.identifier.uri	http://arks.princeton.edu/ark:/99999/fk4n31k22f	-
dc.description.abstract	Single-cell technologies have enabled us to profile the internal state of individual cells with increasingly high granularity, but computational methods to extract deep biological insight from these complex, multi-omic datasets remain underdeveloped. Unstandardized workflows with arbitrary quality-control (QC) thresholds lead to low reproducibility, and it remains challenging for scientists without coding skills to gain biological insight using these valuable data. Here, we develop an end-to-end computational pipeline for rigorous, reproducible analysis of single-cell data, and then we apply this pipeline, along with other analytical techniques, to epigenomic and transcriptomic data gathered from an influenza challenge study to better understand how influenza infection shapes innate immune memory. We first describe our work on SPEEDI (Single-cell Pipeline for End to End Data Integration), a computational end-to-end pipeline that processes single-cell RNA-seq (scRNA-seq), single-cell ATAC-seq (scATAC-seq), or multiome data in a reproducible, robust manner. After reading input data, the pipeline automatically filters the data using algorithmically determined thresholds for common QC metrics, integrates data using a novel data-derived batch inference method, annotates cell types using an internal or user-provided reference object, and then performs preliminary downstream analyses within each cell type. Importantly, SPEEDI is available both as an R package for advanced users and as an interactive web server for biologists with no prior coding experience. We next apply SPEEDI and other analytical techniques to investigate how innate immune memory develops following influenza infection. We leverage blood samples from a human influenza virus challenge study to conduct integrative multi-omic data analyses, focusing specifically on the epigenetic and transcriptomic profiles of subjects at 1 day pre-challenge and 28 days post-challenge. We find that the innate immune system enters a state of suppressed inflammation after resolution of infection, with decreased cytokine and AP-1 gene expression and decreased chromatin accessibility at AP-1 targeted loci and promoter regions of interleukin-related genes. However, increased chromatin accessibility at promoter regions of interferon-related genes and increased MAP kinase gene expression may suggest that the innate immune system is primed to respond to subsequent infection.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.publisher	Princeton, NJ : Princeton University
dc.subject	data science
dc.subject	influenza
dc.subject	multi-omics
dc.subject	software development
dc.subject.classification	Bioinformatics
dc.subject.classification	Computer science
dc.subject.classification	Biology
dc.title	INTEGRATIVE MULTI-OMIC DATA ANALYSIS AND SOFTWARE DEVELOPMENT
dc.type	Academic dissertations (Ph.D.)
pu.date.classyear	2025
pu.department	Quantitative Computational Biology
Appears in Collections:	Quantitative Computational Biology

Files in This Item:

File	Size	Format
Thistlethwaite_princeton_0181D_15342.pdf	10.6 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse