STAT 598K  Fall 2001

Statistical Methods for Computational Biology

Instructor: Professor Jun Xie

Phone: 494-6032

Email: junxie@stat.purdue.edu

TIME: MWF 3:30pm

PLACE: CS G066

Except the first lecture on Aug 20, 2001 will be in Room LAEBB291

www.stat.purdue.edu/~junxie/#TEACHING

 


Alignment of two helix-turn-helix motifs




Course outline

Biological science generates immense data sets from today’s genome projects.  Examples include DNA and protein sequence data, and recently produced DNA microarray data. There is strong motivation for developing computational methods that can infer biological information from these data sets. This course is about the many problems in computational biology that are essentially statistical. We will discuss statistical methods in biological sequence analysis. We will also introduce DNA microarray data sets and some interesting topics on their analyses. This course is interdisciplinary. Students in this class are expected to spend a substantial amount of time reading research articles ranging from statistics to biology.

 

References/Textbook

R. Durbin, S. Eddy, A. Krogh and G. Mitchison Biological sequence analysis: Probabilistic models of proteins and nucleic acids

A. D. Baxevanis, B. F. F. Ouellette Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins

Some research articles to be assigned in the classes.

 

Course prerequisite

STAT 503 or 511. The knowledge in calculus, elementary statistics, basic probability theory, basic principles of molecular genetics will be very helpful.

 

Homework and Grades

Homework will be assigned every week or every other week. There will be one written midterm project and a final that consists of a written project assigned previously plus a take-home exam. Grades are based on the accumulation of all the course work.

 

Instructor office hours

WF after the lectures and by appointments

MATH 518, Phone 46032, Email junxie@stat.purdue.edu

 

List of topics and course schedules


1.      Accessing public biological databases through internet.

The first lecture will briefly introduce NCBI databases, and other links including sequence database, structure database and sequence alignment tools.

www.ncbi.nlm.nih.gov

www.ncbi.nlm.nih.gov/Structure/

www.rcsb.org/pdb

www.ncbi.nlm.nih.gov/BLAST

2.      Pairwise sequence alignment: dynamic programming and other algorithms.

Based on Chapter 2 of the first reference book, lectures of week 1 and week 2.

3.      Markov chain and hidden Markov models.

Chapter 3, 4 of the first reference book, lectures of week 3 and week 4.

4.      Multiple sequence alignment using hidden Markov models.

Chapter 5, 6 of the first reference book, lectures of week 5 and week 6.  

5.      Sequence alignment tools and database searching.

BLAST and other sequence alignment tools provided in the public internet, week 6-7

The first written project is due.

 

6.      Phylogenetic trees.

Chapter 7 of the first reference book, lectures of week 7-8.

7.      Block motif model for aligning local functional segments of proteins.

Reference papers will be assigned before these lectures. Week 9-10

8.      Algorithms of predicting protein functions.

Reference papers will be assigned. Week 10-11

9.      cDNA microarrays and Oligonucleotide arrays.

Reference papers will be assigned. Week 11

10.  Cluster analysis of the array data.

Reference paper will be assigned. Week 12

11.  Dimension reduction for array data, principle component analysis, singular value decomposition and factor analysis. Week 13, 14

 

The final exam including the written project is due in the last week of the semester.