Title: "Using Finite Poisson Mixture Models for RNA-Seq Data Analysis and Transcript Expression Level Quantification"
Speaker: Yu (Michael) Zhu, Department of Statistics, Purdue University
Place: PHYS 223; January 31, 2011, Tuesday, 4:30pm


RNA-Seq has emerged as a powerful technique for transcriptome study. As much as the improved sensitivity and coverage, RNA-Seq also brings challenges for data analysis. The massive amount of sequence read data, excessive variability, uncertainties, and bias and noises stemming from multiple sources make the analysis of RAN-Seq data difficult. Despite much progress, RNA-Seq data analysis still has room for improvement, especially on the quantification of transcript/gene expression levels. In this article, we propose to use finite Poisson mixture models to characterize base pair-level RNA-Seq data and further quantify transcript/gene expression levels. Finite Poisson mixture models combine the strength of fully parametric models with the flexibility of fully nonparametric models, and are extremely suitable for modeling heterogeneous count data such as what we observe from RNA-Seq experiments. In particular, we consider three types of Poisson mixture models and propose to use a BIC-based model selection procedure to adapt the models to individual transcripts. A unified quantification method based on the Poisson mixture models is developed to measure transcript/gene expression levels. The Poisson mixture models and the proposed quantification method were applied to analyze two RNA-Seq data sets and demonstrated excellent performances in comparison with existing methods. Our approach resulted in better characterization of the data and more accurate measurements of transcript expression levels. We believe that finite Poisson mixture models provide a flexible framework to model RNA-Seq data, and methods developed based on this framework have the potential to become powerful tools for RNA-Seq data analysis.

Associated Reading: Ming Hu. Yu Zhu, Jeremy M.G. Taylor, Jun S. Liu, Zhaohui S. Qui. 2011. Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq. Bioinformatics. 28, 1: 63-68.

Click here www.stat.purdue.edu/~doerge/BIOINFORM.D/FALL11/sem.html for a full schedule of BIOINFORMATICS SEMINARS, past and present.