STAT 598K Fall 2001
Statistical Methods for Computational Biology
Instructor: Professor Jun Xie
Phone: 494-6032
Email: junxie@stat.purdue.edu
TIME: MWF 3:30pm
PLACE: CS G066
Except the first
lecture on Aug 20, 2001 will be in Room LAEBB291
www.stat.purdue.edu/~junxie/#TEACHING
Alignment of two
helix-turn-helix motifs
Biological science generates immense data sets from today’s genome projects. Examples include DNA and protein sequence data, and recently produced DNA microarray data. There is strong motivation for developing computational methods that can infer biological information from these data sets. This course is about the many problems in computational biology that are essentially statistical. We will discuss statistical methods in biological sequence analysis. We will also introduce DNA microarray data sets and some interesting topics on their analyses. This course is interdisciplinary. Students in this class are expected to spend a substantial amount of time reading research articles ranging from statistics to biology.
References/Textbook
R. Durbin, S. Eddy, A. Krogh and G. Mitchison Biological sequence analysis:
Probabilistic models of proteins and nucleic acids
A. D. Baxevanis, B. F. F. Ouellette Bioinformatics: A Practical Guide to the
Analysis of Genes and Proteins
Some research articles to be assigned in the classes.
Homework will be assigned every week or every other week. There will be one written midterm project and a final that consists of a written project assigned previously plus a take-home exam. Grades are based on the accumulation of all the course work.
MATH
518, Phone 46032, Email junxie@stat.purdue.edu
1.
Accessing
public biological databases through internet.
The
first lecture will briefly introduce NCBI databases, and other links
including sequence database, structure database and sequence alignment tools.
www.ncbi.nlm.nih.gov/Structure/
2.
Pairwise
sequence alignment: dynamic programming and other algorithms.
Based
on Chapter 2 of the first reference book, lectures of week 1 and week 2.
3.
Markov
chain and hidden Markov models.
Chapter
3, 4 of the first reference book, lectures of week 3 and week 4.
4.
Multiple
sequence alignment using hidden Markov models.
Chapter
5, 6 of the first reference book, lectures of week 5 and week 6.
5.
Sequence
alignment tools and database searching.
BLAST
and other sequence alignment tools provided in the public internet, week 6-7
The first written project is due.
6.
Phylogenetic
trees.
Chapter
7 of the first reference book, lectures of week 7-8.
7.
Block
motif model for aligning local functional segments of proteins.
Reference
papers will be assigned before these lectures. Week 9-10
8.
Algorithms
of predicting protein functions.
Reference
papers will be assigned. Week 10-11
9.
cDNA
microarrays and Oligonucleotide arrays.
Reference
papers will be assigned. Week 11
10. Cluster analysis of the
array data.
Reference
paper will be assigned. Week 12
11. Dimension reduction for
array data, principle component analysis, singular value decomposition and
factor analysis. Week 13, 14
The final exam including the written project is due in the last week of
the semester.