Statistical Methods For Bioinformatics and Computational Biology

Fall 2011

TTh 10:30 - 11:45am, REC 307

Instructor: Olga Vitek

Office: HAAS 120

Phone: (765) 496-9544


Office hours: Tue 11:45am-12:45pm, Wed 9:30am-10:30am or by appointment

Syllabus: here

Grades: Blackboard

Useful links: R CRAN, reference and search. Software consulting. RStudio. Bioconductor.

Textbooks: Draghici 2011, Hahne et al 2008, Gentleman et al 2005


Bio: Overview and scientific questions.

Stat: Computing in bioinformatics.

Tuesday,   Aug 23: Lecture notes.

Thursday,  Aug 25: R notes. SNP dataset. Sweave test and output files.

                                Homework 1 out. Sweave help. Latex help.

Required:    Draghici Ch.2 and Ch.6.

Suggested: Organism as a systemSystems biology 101. Bioconductor.

Gene expression microarrays

Bio: Measurement technology. Typical applications.

Stat: Signal processing. Normalization.

Tuesday,   Aug 30: Lecture notes.

Thursday,  Sep 1:  Project 1 guidelines. Lecture notes. R code S4 class and ExpressionSet.

                              Homework 1 due. Solutions 1, 2 and 3.

Tuesday,   Sep 6: Lecture notes. R code. Homework 2 out.

                             Data for problem 4: abundance, peak info, sample info, eset.

Required:   Draghici Ch.3 and Ch.4. BioC case studies Ch.2, 3 and 5.  Getting organized.

Suggested: cDNA and Affy. Analysis workflow. Reproducibility. Background correction.

                   Spike-in-based evaluation 1 and 2. General validity.

Next generation sequencing: RNA-seq

Bio: Measurement technology. Typical applications.

Stat: Signal processing. Normalization.

Thursday,  Sep 8:

Tuesday,   Sep 13:

Thursday,  Sep 15: Lecture notes. Homework 2 due. Homework 3 out.

                               Literature review project groups due.

Tuesday,   Sep 20: No class.

Thursday,  Sep 22: Tour of the Purdue Genomic facility by Dr. Phillip San Miguel. WSLR S039.

                               Literature review project proposals due.

Suggested: RNA-seq. Technology 1 and 2. Analysis workflow. Computation primer, 1, 2, 3 and 4.

                   Reproducibility. Addressing challenges. Outlook.

Finding differential abundance: gene expression microarrays

Bio: Testing with continuous measurements.

Stat: Classical and Empirical Bayes models.

Monday,    Sep 26: Make-up class. 11:30am-12:30pm, UNIV201.

Tuesday,   Sep 27: Lecture notes.

Thursday,  Sep 29: R code. Homework 3 due. Homework 4 out. Dataset for homework 4.

Required:   Draghici Ch. 8-14, Hahne et al 2008, Ch. 5, 6 and 7; Gentleman et al 2005, Ch. 23

Suggested: Microarrays: Limma 1 and 2. Hierarchical models primer.

Tuesday,   Oct 4: Guest lecture, Theodore Alexandrov, U. Bremen. MS-based imaging.

Suggested: Reviews 1 and 2. Proteomics: primer, reviews 1, 2, 3 and 4.

Finding differential abundance: RNA-seq

Bio: Testing with categorical measurements.

Stat: Classical and Empirical Bayes models.

Thursday,  Oct 6:

Tuesday,   Oct 11: No class, no office hours. October break.

Thursday,  Oct 13: Homework 4 due. Lecture notes. R code. Literature review projects due.

Suggested: RNA-seq: Review. EdgeR 1, 2 and 3. DESeq 1 and 2.

                   Biological variation. Importance of statistics.

                   Proteomics: spectral count 1.

Multiple comparisons

Bio: Multiplicity of testing and multivariate error rates.

Stat: P-value filtering. Resampling-based methods. Two-group models.

Tuesday,   Oct 18: Project 2 guidelines. Homework 5 out: problems, reference, dataset.

Thursday,  Oct 20: Project 2 groups due.

Thursday,  Oct 27: Lecture notes.

Required:   Draghici Ch. 16., Hahne et al 2008, Ch. 7

Suggested: Primer 1 and 2. Review. FDR 1.

                   Non-specific filtering and power.

Tuesday,   Oct 25: Guest lecture, Alexey Nesvizhskii, U. Michigan.

“Analysis of protein interaction networks and protein complexes using AP/MS technology”.

Suggested: Determination of protein-protein interactions, reviews 1 and 2.

Planning new experiments

Bio: Reproducible research.

Stat: Allocation of experimental resources. Sample size calculation.

Tuesday,   Nov 1: Lecture notes. Homework 5 due.

Required:   Draghici Ch.15.

Suggested: Bias. Forensic bioinformatics. Cautionary tales.

                   Principles in proteomics 1 and RNAseq 1 and 2. Sample size guidelines.

Unsupervised data exploration

Bio: Data visualization.

Stat: Principle component analysis and clustering.

Thursday,  Nov 3: Homework 6 out.

Tuesday,   Nov 8: R code.

Required:   Draghici Ch. 17.9, Ch. 18.

Suggested: Data exploration 1 and 2. PCA primer, reviews 1 and 2. Clustering primer and review.

Supervised classification

Bio: Discovery of biomarkers of disease.

Stat: Supervised classification. Regularized estimation. Measures of performance.

Thursday,  Nov 10: Homework 6 due. Data analysis project proposals due.

Tuesday,   Nov 15:

Thursday,  Nov 17: Lecture notes. R code. Homework 7 out.

Tuesday,   Nov 22:

Thursday,  Nov 24: No class. Thanksgiving break.

Required:   Draghici Ch. 29

Suggested: Wehrens ‘Chemometrics with R

                   SVM 1 and 2. Penalized methods 1. PLS 1, 2 and 3.

                   Evaluation: ROC curve 1 and 2. Added predictive value 1. Over-optimism 1 and 2.

                   Critical review of published microarray experiments 1. MAQC II.

Gene-set analysis

Bio: Gene ontology. Functional annotations.

Stat: Gene-set enrichment analysis.

Tuesday,   Nov 29: Lecture notes.

Tuesday,   Dec 6:

Thursday,  Dec 1: Data analysis projects due. R code. Homework 8 out. Homework 7 due.

Required:   Draghici Ch. 22-25. Hahne et al 2008, Ch. 8, 13 and 14.

Suggested: Gene ontology 1. Annotation tools 1 and 2. Semantic similarity 1.

                    Gene-set methods review 1 and 2. GSEA.

Network-based approaches

Bio: Types and properties of biological networks.

Stat: Use of networks in statistical modeling.

Thursday,  Dec 8: Project evaluations due. Lecture notes. Homework 8 due.

Required:   Draghici Ch. 28.

Suggested: First systems biology study. Pathway constriction primer 1 and validation.

                   Integration of networks and gene expression in Cytoscape 1.

                   Network medicine: and use 1, 2, 3, 4 and 5.

                   Network-based function prediction 1, 2, 3, 4 and 5.

Tentative schedule and handouts