The Role of Read Depth in the Design and Analysis of Sequencing Experiments

Robinson, David  Garrett

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01hd76s238c

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Storey, John D	en_US
dc.contributor.author	Robinson, David Garrett	en_US
dc.contributor.other	Quantitative Computational Biology Department	en_US
dc.date.accessioned	2015-06-23T19:38:29Z	-
dc.date.available	2015-06-23T19:38:29Z	-
dc.date.issued	2015	en_US
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01hd76s238c	-
dc.description.abstract	The development of quantitative sequencing technologies, such as RNA-Seq, Bar-Seq, ChIP-Seq, and metagenomics, has offered great insight into molecular biology. Proper design and analysis of these experiments require statistical models and techniques that consider the specific nature of sequencing data, which typically consists of a matrix of read counts per feature. An issue of particular importance to the development of these methods is the role of read depth in statistical accuracy and power. The depth of an experiment affects the power to make biological conclusions, meaning an experiment design must consider the tradeoff between cost, power, and the number of samples that are examined. Similarly, per-gene read depth affects each gene's power and accuracy, and must be taken into account in any downstream analysis. Here I explore many facets of the role of read depth in the design and analysis of sequencing experiments, and offer computational and statistical methods for addressing them. To assist in the design of sequencing experiments, I present subSeq, which examines the effect of depth in an experiment by subsampling reads to simulate lower depths. I use this method to examine the extent of read saturation across a variety of RNA-Seq experiments, and demonstrate a statistical model for predicting the effect of increasing depth in any experiment. I consider intensity-dependence in a technology comparison between microarrays and RNA-Seq, and show that the variance added by RNA-Seq depends more on depth than the variance in microarray depends on fluorescence intensity. I demonstrate that Bar-Seq data shares these depth-dependent properties with RNA-Seq and can be analyzed by the same tools, and further provide suggestions on the appropriate depth for Bar-Seq experiments. Finally, I show that per-gene read depth can be taken into account in multiple hypothesis testing to improve power, and introduce the method of functional false discovery rate (fFDR) control.	en_US
dc.language.iso	en	en_US
dc.publisher	Princeton, NJ : Princeton University	en_US
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the <a href=http://catalog.princeton.edu> library's main catalog </a>	en_US
dc.subject	Differential expression	en_US
dc.subject	Experimental design	en_US
dc.subject	False discovery rate	en_US
dc.subject	Read depth	en_US
dc.subject	RNA-Seq	en_US
dc.subject	Sequencing	en_US
dc.subject.classification	Bioinformatics	en_US
dc.subject.classification	Statistics	en_US
dc.subject.classification	Genetics	en_US
dc.title	The Role of Read Depth in the Design and Analysis of Sequencing Experiments	en_US
dc.type	Academic dissertations (Ph.D.)	en_US
pu.projectgrantnumber	690-2143	en_US
Appears in Collections:	Quantitative Computational Biology

Files in This Item:

File	Description	Size	Format
Robinson_princeton_0181D_11406.pdf		5.02 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse