Title: "A Sequence-Clustering Approach to Motif Discovery for Proteins"
Speaker: Sun Kim, School of Informatics/Center for Genomics and Bioinformatics Indiana University-Bloomington
Place: Mechanical Engineering (ME) 161; Tuesday, 4:30pm

Abstract

We have been developing search frameworks for motif discovery from a set of sequences using our own sequence clustering algorithm, BAG, and existing motif discovery programs. Most of motif sites prediction methods adopt either word enumeration or position specific scoring matrix (PSSM) updating techniques, or both. The final predicted sites are the ones with highest statistically measured scores, which are based on the ratio between probability of the motif model M_m and the background model M_b. The main challenge is that we do not know where motif sites are, i.e, M_m . Thus motif prediction methods rely on the iterative refinement procedure where M_m and M_b are iteratively refined by selecting different candidate motif sites. The motivation for our research in developing the motif search frameworks is as follow. There has been extensive research on motif search algorithms. However, the effect of the input sequences is not well studied. M_m are constructed using candidate motif sites and M_b are character frequencies with some prior knowledge. Thus it is obvious that the input sequence set has a direct impact on the search space, thus on the final motif prediction. Indeed, biologists carefully compile the input sequences before submitting them to a motif discovery algorithm. Our search frameworks are designed to guide the selection of the input sequences using our sequence clustering algorithm BAG and pattern selection methods. We demonstrated that existing motif discovery algorithms, MEME and Gibbs, can be improved by guiding their search with clustering sequences, pattern refinement, or both.

Click here for a full schedule of BIOINFORMATICS SEMINARS, past and present.