INTEGRATIVE MULTI-OMIC DATA ANALYSIS AND SOFTWARE DEVELOPMENT

Thistlethwaite, William

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk4n31k22f

Title:	INTEGRATIVE MULTI-OMIC DATA ANALYSIS AND SOFTWARE DEVELOPMENT
Authors:	Thistlethwaite, William
Advisors:	TROYANSKAYA, OLGA G.
Contributors:	Quantitative Computational Biology Department
Keywords:	data science influenza multi-omics software development
Subjects:	Bioinformatics Computer science Biology
Issue Date:	2025
Publisher:	Princeton, NJ : Princeton University
Abstract:	Single-cell technologies have enabled us to profile the internal state of individual cells with increasingly high granularity, but computational methods to extract deep biological insight from these complex, multi-omic datasets remain underdeveloped. Unstandardized workflows with arbitrary quality-control (QC) thresholds lead to low reproducibility, and it remains challenging for scientists without coding skills to gain biological insight using these valuable data. Here, we develop an end-to-end computational pipeline for rigorous, reproducible analysis of single-cell data, and then we apply this pipeline, along with other analytical techniques, to epigenomic and transcriptomic data gathered from an influenza challenge study to better understand how influenza infection shapes innate immune memory. We first describe our work on SPEEDI (Single-cell Pipeline for End to End Data Integration), a computational end-to-end pipeline that processes single-cell RNA-seq (scRNA-seq), single-cell ATAC-seq (scATAC-seq), or multiome data in a reproducible, robust manner. After reading input data, the pipeline automatically filters the data using algorithmically determined thresholds for common QC metrics, integrates data using a novel data-derived batch inference method, annotates cell types using an internal or user-provided reference object, and then performs preliminary downstream analyses within each cell type. Importantly, SPEEDI is available both as an R package for advanced users and as an interactive web server for biologists with no prior coding experience. We next apply SPEEDI and other analytical techniques to investigate how innate immune memory develops following influenza infection. We leverage blood samples from a human influenza virus challenge study to conduct integrative multi-omic data analyses, focusing specifically on the epigenetic and transcriptomic profiles of subjects at 1 day pre-challenge and 28 days post-challenge. We find that the innate immune system enters a state of suppressed inflammation after resolution of infection, with decreased cytokine and AP-1 gene expression and decreased chromatin accessibility at AP-1 targeted loci and promoter regions of interleukin-related genes. However, increased chromatin accessibility at promoter regions of interferon-related genes and increased MAP kinase gene expression may suggest that the innate immune system is primed to respond to subsequent infection.
URI:	http://arks.princeton.edu/ark:/99999/fk4n31k22f
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Quantitative Computational Biology

Files in This Item:

File	Size	Format
Thistlethwaite_princeton_0181D_15342.pdf	10.6 MB	Adobe PDF	View/Download

Show full item record

Search

Browse