STAT 598C

Statistical Methods For Bioinformatics and Computational Biology

Fall 2011

TTh 10:30 - 11:45am, REC 307

STAT 598C

Statistical Methods For Bioinformatics and Computational Biology

Fall 2011

TTh 10:30 - 11:45am, REC 307

Instructor: Olga Vitek

Office: HAAS 120

Phone: (765) 496-9544

Email: ovitek@stat.purdue.edu

Office hours: Tue 11:45am-12:45pm, Wed 9:30am-10:30am or by appointment

Syllabus: here

Grades: Blackboard

Useful links: R CRAN, reference and search. Software consulting. RStudio. Bioconductor.

Textbooks: Draghici 2011, Hahne et al 2008, Gentleman et al 2005

Introduction

Bio: Overview and scientific questions.

Stat: Computing in bioinformatics.

Tuesday, Aug 23: Lecture notes.

Thursday, Aug 25: R notes. SNP dataset. Sweave test and output files.

Homework 1 out. Sweave help. Latex help.

Required: Draghici Ch.2 and Ch.6.

Suggested: Organism as a system. Systems biology 101. Bioconductor.

Gene expression microarrays

Bio: Measurement technology. Typical applications.

Stat: Signal processing. Normalization.

Tuesday, Aug 30: Lecture notes.

Thursday, Sep 1: Project 1 guidelines. Lecture notes. R code S4 class and ExpressionSet.

Homework 1 due. Solutions 1, 2 and 3.

Tuesday, Sep 6: Lecture notes. R code. Homework 2 out.

Data for problem 4: abundance, peak info, sample info, eset.

Required: Draghici Ch.3 and Ch.4. BioC case studies Ch.2, 3 and 5. Getting organized.

Suggested: cDNA and Affy. Analysis workflow. Reproducibility. Background correction.

Spike-in-based evaluation 1 and 2. General validity.

Next generation sequencing: RNA-seq

Bio: Measurement technology. Typical applications.

Stat: Signal processing. Normalization.

Thursday, Sep 8:

Tuesday, Sep 13:

Thursday, Sep 15: Lecture notes. Homework 2 due. Homework 3 out.

Literature review project groups due.

Tuesday, Sep 20: No class.

Thursday, Sep 22: Tour of the Purdue Genomic facility by Dr. Phillip San Miguel. WSLR S039.

Literature review project proposals due.

Suggested: RNA-seq. Technology 1 and 2. Analysis workflow. Computation primer, 1, 2, 3 and 4.

Reproducibility. Addressing challenges. Outlook.

Finding differential abundance: gene expression microarrays

Bio: Testing with continuous measurements.

Stat: Classical and Empirical Bayes models.

Monday, Sep 26: Make-up class. 11:30am-12:30pm, UNIV201.

Tuesday, Sep 27: Lecture notes.

Thursday, Sep 29: R code. Homework 3 due. Homework 4 out. Dataset for homework 4.

Required: Draghici Ch. 8-14, Hahne et al 2008, Ch. 5, 6 and 7; Gentleman et al 2005, Ch. 23

Suggested: Microarrays: Limma 1 and 2. Hierarchical models primer.

Tuesday, Oct 4: Guest lecture, Theodore Alexandrov, U. Bremen. MS-based imaging.

Suggested: Reviews 1 and 2. Proteomics: primer, reviews 1, 2, 3 and 4.

Finding differential abundance: RNA-seq

Bio: Testing with categorical measurements.

Stat: Classical and Empirical Bayes models.

Thursday, Oct 6:

Tuesday, Oct 11: No class, no office hours. October break.

Thursday, Oct 13: Homework 4 due. Lecture notes. R code. Literature review projects due.

Suggested: RNA-seq: Review. EdgeR 1, 2 and 3. DESeq 1 and 2.

Biological variation. Importance of statistics.

Proteomics: spectral count 1.

Multiple comparisons

Bio: Multiplicity of testing and multivariate error rates.

Stat: P-value filtering. Resampling-based methods. Two-group models.

Tuesday, Oct 18: Project 2 guidelines. Homework 5 out: problems, reference, dataset.

Thursday, Oct 20: Project 2 groups due.

Thursday, Oct 27: Lecture notes.

Required: Draghici Ch. 16., Hahne et al 2008, Ch. 7

Suggested: Primer 1 and 2. Review. FDR 1.

Non-specific filtering and power.

Tuesday, Oct 25: Guest lecture, Alexey Nesvizhskii, U. Michigan.

“Analysis of protein interaction networks and protein complexes using AP/MS technology”.

Suggested: Determination of protein-protein interactions, reviews 1 and 2.

Planning new experiments

Bio: Reproducible research.

Stat: Allocation of experimental resources. Sample size calculation.

Tuesday, Nov 1: Lecture notes. Homework 5 due.

Required: Draghici Ch.15.

Suggested: Bias. Forensic bioinformatics. Cautionary tales.

Principles in proteomics 1 and RNAseq 1 and 2. Sample size guidelines.

Unsupervised data exploration

Bio: Data visualization.

Stat: Principle component analysis and clustering.

Thursday, Nov 3: Homework 6 out.

Tuesday, Nov 8: R code.

Required: Draghici Ch. 17.9, Ch. 18.

Suggested: Data exploration 1 and 2. PCA primer, reviews 1 and 2. Clustering primer and review.

Supervised classification

Bio: Discovery of biomarkers of disease.

Stat: Supervised classification. Regularized estimation. Measures of performance.

Thursday, Nov 10: Homework 6 due. Data analysis project proposals due.

Tuesday, Nov 15:

Thursday, Nov 17: Lecture notes. R code. Homework 7 out.

Tuesday, Nov 22:

Thursday, Nov 24: No class. Thanksgiving break.

Required: Draghici Ch. 29

Suggested: Wehrens ‘Chemometrics with R’

SVM 1 and 2. Penalized methods 1. PLS 1, 2 and 3.

Evaluation: ROC curve 1 and 2. Added predictive value 1. Over-optimism 1 and 2.

Critical review of published microarray experiments 1. MAQC II.

Gene-set analysis

Bio: Gene ontology. Functional annotations.

Stat: Gene-set enrichment analysis.

Tuesday, Nov 29: Lecture notes.

Tuesday, Dec 6:

Thursday, Dec 1: Data analysis projects due. R code. Homework 8 out. Homework 7 due.

Required: Draghici Ch. 22-25. Hahne et al 2008, Ch. 8, 13 and 14.

Suggested: Gene ontology 1. Annotation tools 1 and 2. Semantic similarity 1.

Gene-set methods review 1 and 2. GSEA.

Network-based approaches

Bio: Types and properties of biological networks.

Stat: Use of networks in statistical modeling.

Thursday, Dec 8: Project evaluations due. Lecture notes. Homework 8 due.

Required: Draghici Ch. 28.

Suggested: First systems biology study. Pathway constriction primer 1 and validation.

Integration of networks and gene expression in Cytoscape 1.

Tentative schedule and handouts