Purdue U.Dept. of Statistics
Banner-Bottom Right Seminars and Events

Special Colloquia, Department of Statistics

Monday, February 10, 2003 in REC 112
4:30 PM

Dr. Jing Wu
University of California, Santa Cruz

will speak on

Computational Genefinding: Probabilistic Models and Statistical Methods

Abstract


Computational methodology for finding genes and other functional sites in genomic DNA has evolved significantly over the last 20 years. One type of functional sites in genomic DNA that researchers have sought to recognize is various binding sites. Finding IHF binding sites in E. coli DNA is one popular problem people would like to solve. In our approach, a positional weight matrix is derived from a set of known IHF binding sites and a hidden semi-Markov model based on the positional weight matrix is developed to simulate the IHF binding sites in E. coli DNA as well as for detecting putative binding sites in E. coli DNA.

A new class of gene-prediction algorithms that recently been reported has shown the power of comparative genomics. The existing genefinding algorithms focus on locating exons in genomic sequence which have limitation on the input sequences as well as lack of statistical confidence. Another algorithm designed to detect conserved structural RNAs along with detecting coding regions is computationally heavy and is focused on structural RNAs. We use sequence similarity between human and mouse to classify alignments into coding regions and non-coding regions. Based on the aligned sequences of human and mouse, we propose a log-odds ratio score that based on conservation measurements and use the distribution of log-odds ratio scores of a fixed window size of a gapless alignment to separate alignments that contain coding regions from alignments that do not contain coding regions. The confidence level of our predictions of new coding regions is given by a multiple hypotheses testing that controls false discovery rate. The correctness of our prediction is validated by the 1M alignments of ancient repeats and 90,000 exons from refSeq mRNA and 932 pseudogenes produced by Sanger Institute.


Home | General Info | People | Academic Programs and Courses
Seminars and Events | Research | Consulting | Career Resources
Related Programs and Links | Site Index | Site Search

../../../Dept. of Statistics ©1999 Department of Statistics
Last Update: Feb 3, 2003
Please send comments and suggestions to the Webmaster.