Title: "On estimating dispersion in Negative Binomial models for RNA-seq experiments with small sample size"
Speaker: Danni Yu, Department of Statistics, Purdue University
Place: PHYS 223
Date: April 3, 2012, Tuesday, 4:30pm


Motivation. RNA-Seq is the technology of choice for detecting differentially expressed genes. It produces measurements in form of digital read counts, however the counts are affected by complex sources of biological and technical variation. To distinguish the systematic changes in expression from noise, the measurements are frequently modeled by the Negative Binomial distribution, where normalized expectations and dispersions are estimated from the data. Unfortunately, in experiments with a small number of replicates the per-gene estimates of dispersion are unreliable. A variety of methods address this by modeling the structure of the dispersions across all the genes, and combining the per-gene and the overall estimates of the dispersion. Although these methods generally improve the results, their performance often depends on whether the data satisfy the dispersion-specific modeling assumptions.

Method. We propose a simple and effective approach for estimating dispersions in the Negative Binomial model for RNA-seq data. The approach starts by estimating per-gene dispersions with the method of moments, and proceeds by directly regularizing the estimates with a shrinkage estimator. The approach is flexible in that it does not require assumptions on the mean-variance structure beyond what is specified by the Negative Binomial distribution. It is compatible with the exact test of differential expression, and is easily extended to complex designs, e.g. with paired samples or with a time course.

Results. We evaluated the proposed approach using eight simulated and experimental datasets with and without biological replicates, and compared its performance with the currently popular packages edgeR, baySeq, BBSeq and SAMSeq.

Associated Reading:
S. Anders and W. Huber. 2010. Differential expression analysis for sequence count data. Genome Biology 2010, 11:R106.

Click here www.stat.purdue.edu/~doerge/BIOINFORM.D/FALL11/sem.html for a full schedule of BIOINFORMATICS SEMINARS, past and present.