Title: "Statistical Models for Transcripts Quantification Using RNA-Seq"
Speaker: Han Wu; Eli Lilly, Indianapolis and formerly Purdue Statistics

Place: SMITH (SMTH) Hall 108
Date: March 11, 2014; Tuesday
Time: 4:30pm

RNA-Seq has emerged as the method of choice for profiling the transcriptomes of organisms. In particular, it aims to quantify the expression levels of transcripts using short nucleotide sequences, or short reads, generated from RNA-Seq experiments. Because the label of the transcript that each short read is generated from is missing, short reads are mapped to the genome rather than the transcriptome. Therefore, the quantification of transcript expression levels is an indirect statistical inference problem. A number of methods have been proposed for quantifying transcript expression levels in the literature. Although being effective in many cases, these methods can become ineffective in some other cases, and may even suffer from the non-identifiability problem. A key drawback of these existing methods is that they fail to utilize all the formation in the RNA-Seq short read count data. We propose to use individual exonic base pairs as observation units and further to model nonzero as well as zero counts at all base pairs at both the transcript and gene levels. At the transcript level, two-component Poisson mixture distributions are postulated, which give rise to the Convolution of Poisson mixture (CPM) distribution model at the gene level. The maximum likelihood estimation method equipped with the EM algorithm is used to estimate model parameters and quantify transcript expression levels. We refer to the proposed method as CPM-Seq. Both simulated data and real data applicationis have demonstrated the effectiveness of CPM-Seq, and have shown that CPM-Seq produced more accurate and consistent quantification results than the commonly used software package Cuff links.

Click here www.stat.purdue.edu/~doerge/BIOINFORM.D/SPRING14/sem.html for a full schedule of BIOINFORMATICS SEMINARS, past and present.