Title: "Extreme Value Distribution in Gene Prediction"
Speaker: Dr. Jing Wu; Department of Statistics, Purdue University
Place: Stanley Coulter (SC) 239; Tuesday, November 16, 2004; 4:30pm


We describe a novel approach for gene prediction by binding site database. We assign scores to each gene according to the maximum score from the positional weight matrix. For unknown binding sites, we propose to use positional weight matrices from known binding sites database and select relevant matrices based on training data. We also propose a classifier that utilizes every relevant matrices to identify genes of the same function as the training data. Our method could be applied to single sequences and could easily be applied to aligned sequences which increases the predictive power. For a given response element, we test the hypothesis that the gene is not a target gene against the alternative that the gene is a target gene. We verified our method on several types of data sets. First, we applied our methods on sets of genes of known functions. Based on conserved promoter region of human and mouse sequences, we identified 84% HNF1 target genes with estimated false positive rate of 20% using the HNF1 positional weight matrices in TRANSFAC public database 6.0. Using the E2F and NF-Y positional weight matrices, we identified 82% of E2F target genes with an estimated false positive rate 20%. Second, we applied our methods on sets of genes of S. cerevisiae that are clustered by their expression patterns in Beer and Tavazioe (2004). Our methods give similar or better results than using the Baysian network as proposed by Beer and Tavazioe (2004).