Tuesday, September 8, 2009
04:30 PM in HORT 117
Paul Livermore Auer
Department of Statistics, Purdue University

Statistical issues in next-generation sequencing — An overview and case study

Abstract

Next-generation or "second-generation" sequencing has emerged as an accurate new tool that has already lent itself to a large number of applications (e.g., variant discovery, profiling of histone modifications, identifying transcription factor binding sites, resequencing, and transcriptome chararcterization). Specifically, in RNA-Sequencing (RNA-Seq) experiments, the Illumina/Solexa Genome Analyzer (a next-generation sequencing technology) has been used with great success. Even though the technology is a success there are still large domains of unsolved statistical issues that need to be addressed (e.g., understanding errors in the sequencing process and modeling gene expression in the down-stream analysis). Additionally, these problems are compounded by the both the size and complexity of RNA-Seq data. In this talk, I will provide an overview of the Solexa sequencing technology, introduce some of the statistical and computational issues involved, and detail a specific RNA-Seq data analysis that we have done for Scott Jackson's lab at Purdue University.

Recommended Reading:

Wang, Z., M. Gerstein and M. Snyder, 2009 RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 10: 57-63.

Click here for a full schedule of BIOINFORMATICS SEMINARS, past and present.