Title: "Enhanced Automated Function Prediction for Proteomics Analysis"
Speaker: Daisuke Kihara, Department of Computer Science; Purdue University
Place: Mechanical Engineering (ME) 161; Tuesday, 4:30pm

Abstract

The increasing number of genome sequences in public databases has become an essential data for biology of this century. When a genome is newly sequenced, function is assigned to Open Reading Frames (ORFs) by sequence database search algorithms, such as BLAST or FASTA. A problem of the current algorithms is the low coverage of annotation in a genome; indeed typically half or more of the genes in a genome remained functionally unknown. The development of a function assignment/prediction algorithm which has a larger coverage is more strongly desired in the context of omics-type experiments, which includes microarray gene expression analysis, proteomics, transcriptomics. These omics experiments detect a large number of genes which share the same biological property. To analyze these omics data, function annotation is crucial because no biological conclusion can be drawn from experiments without annotated genes. What gives biological insight is not a set of genes it self, but a set of function annotation of genes.

To overcome the limitation of conventional homology based function annotation methods, we have designed two different approaches: The first one is a sequence-based approach, named PFP (Protein Function Prediction). PFP extends the functionality of a typical PSI-BLAST search by applying data mining techniques. PFP was ranked the best in a protein function prediction competition held at the Automated Function Prediction Special Interest Group (AFP-SIG) meeting in the 13th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) in June, 2005. PFP is publicly available as a web server (http://dragon.bio.purdue.edu/pfp/).

The second one is a protein surface shape based approach. Characteristic local surface shapes of a query protein is detected and compared with known active sites pre-computed and stored in a database. The need of predicting function of proteins from their tertiary structure has emerged by the structural genomics projects, which solve increasing number of protein structures of unknown function.

Rapid developments of new omics experimental techniques have seriously demanded renovation of bioinformatics tools - and here are a part of our responses to meet their needs.

Click here for a full schedule of BIOINFORMATICS SEMINARS, past and present.