Statistical Methods For Bioinformatics and Computational Biology

Fall 2013

TTh 10:30 - 11:45am, REC 307

Instructor: Olga Vitek

Office: HAAS 120

Phone: (765) 496-9544


Office hours: Tue 11:45am-12:45pm, Wed 9:00am-10:00am or by appointment

Admin: Syllabus, Piazza, Blackboard.

R: CRAN, reference, search. RStudio. Bioconductor. Purdue software consulting.

Books: BioC case studies. Stats with microarrays. BioC book. Stats with R. Elements of learning.


Bio: Terms, concepts and examples.

Stat: Reproducible computational research.

Tuesday,   Aug 20: Lecture notes.

Thursday,  Aug 22: R examples: .R, .Rmd, .Rnw, .csv. Hw1 out.

Reading: Organizing projects, reproducible research, Bioconductor, R intro.

Gene expression & protein abundance: continuous data

Bio: Measurement technology. Typical applications.

Stat: Signal processing. Normalization.

Tuesday,   Aug 27: Lecture notes.

Thursday,  Aug 29: Lecture notes. R examples: signal processing, expressionSet, data structures

                               Hw1 due: solutions, .Rnw. Hw2 out: problems 1-3,

                               Problem 4: NMR expressionSet, .csv files 1, 2 and 3.

Reading: BioC case studies Ch2, 3, 5; BioC book Ch3. Spike-in evaluation 1 and 2.

Gene expression: count data

Bio: Measurement technology. Typical applications.

Stat: Signal processing. Normalization.

Tuesday,   Sep 3: Lecture notes. Extra slides.

Thursday,  Sep 5: Guest lecture: Nadia Atallah, PhD candidate, Botany and Plant Pathology. Slides.

                              Project 1 groups due. Hw2 due. Hw3 out.

Reading: Sequencing: technology 1, 2, methods 1, 2, 3, filtering. WikiBook. Community.

               Transcriptome quantification: overview, methods 1, 2, protocol, biological variation.

Of interest: 1, 2, 3

Finding differential abundance: continuous measurements

Bio: Testing with continuous measurements.

Stat: Classical and Empirical Bayes models.

Tuesday,   Sep 10: Lecture notes. Project 1 guidelines.

Thursday,  Sep 12: R examples. Hw3 due. Hw4 out: problems, dataset.

Reading: Draghici Ch8-14, 21; BioC case studies Ch5, 6 and 7; BioC book Ch23. Hierarchical models.

Of interest: Limma 1, 2. SAM paper and url.

Finding differential abundance: RNA-seq

Bio: Testing with categorical measurements.

Stat: Classical and Empirical Bayes models.

Tuesday,   Sep 17:

Thursday,  Sep 19: Project 1 proposals due. Lecture notes. Hw4 due.

Tuesday,   Sep 24: R examples. Hw5 out: problems, dataset.

Reading: Overview, datasets description and link.

Of interest: edgeR 1, 2, 3, 4, 5, DESeq 1, 2, sSeq, SAMseq.

Multiple comparisons

Bio: Multiplicity of testing and multivariate error rates.

Stat: P-value filtering. Resampling-based methods. Two-group models.

Thursday,  Sep 26:

Tuesday,   Oct 1:

Thursday,  Oct 3: Lecture notes. R examples. Hw5 due.

Reading: Draghici Ch16, BioC case studies Ch7. Primers 1, 2.

Of interest: Non-specific filtering. FDR 1, 2, 3, 4, 5. Resampling.

Planning new experiments

Bio: More on reproducible research.

Stat: Allocation of experimental resources. Sample size calculation.

Tuesday,   Oct 8: No class, no office hours. October break.

Thursday,  Oct 10: Project 1 due. Lecture notes.

Tuesday,   Oct 15: Hw6 out. Project 2 guidelines.

Thursday,  Oct 17:

Reading: Draghici Ch15.

Of interest: Reproducibility: Bias. Reproducibility challenges general 1,2,3; microarrays, proteomics.

                Statistical experimental design: microarrays; RNA-seq 1,2; proteomics. Sample size: 1, 2.

Unsupervised data exploration

Bio: Data visualization.

Stat: Principle component analysis, clustering and biclustering. Feature selection.

Tuesday,  Oct 22: Lecture notes.

Thursday,   Oct 24: Project 2 groups due. R examples. Hw6 due. Hw7 out.

Tuesday,   Oct 29:

Reading: Draghici Ch17.9, 18. BioC book Ch12. Data visualization 1, 2. Primers 1, 2. Reviews 1, 2, 3.

Of interest: Number of clusters 1, 2. Discover associations. Select features. Account for noise.

                Module maps in cancer, microarrays and imaging, mass spectrometry and imaging.

Supervised classification

Bio: Discovery of biomarkers of disease.

Stat: Supervised classification. Feature selection. Measures of performance.

Thursday,  Oct 31: Lecture notes. Hw7 due.

Tuesday,   Nov 5: Project 2 proposals due.

Thursday,  Nov 7: Hw8 out. Case studies.

Tuesday,   Nov 12: Lecture notes. R examples.

Reading: Draghici Ch29. Primer 1, reviews 1, 2, 3.

Of interest: Regularized methods 1, 2, 3, 4.

                MAQC II. Methods guidelines 1, 2, 3. Example studies 1, 2, 3, 4, 5.

Biomolecular networks: exploration and inference

Bio: Pathway analysis vs network inference.

Stat: Statistical modeling.

Thursday,  Nov 14:

Tuesday,   Nov 19: Lecture notes. Hw8 due.

Thursday,  Nov 21: Lecture notes. R examples 1 and 2, and datasets 1, 2, 3 and 4.

                  Hw9 out: group A, group B. Datasets 1, 2 and 3.

Tuesday,   Nov 26: Guest lecture: Robert Ness, PhD candidate, Statistics.

Thursday,   Nov 28: No class, no office hours. Thanksgiving break.

Of interest: Importance of biological networks 1, 2, 3, 4, 5. Practical steps of data integration: 1, 2, 3, 4.

                   Pathways analysis: GO, gene-set enrichment 1, 2.

                   Network inference: Reviews 1, 2, 3. Example studies 1,2.

                   Graphical models 1, 2, 3. Regularized methods 1.


Topics in R-based statistical computing

R data structures. Data manipulation. Debugging.

Monday,  Dec 2: Project 2 due.

Tuesday,  Dec 3: Guest lecture: Robert Ness, PhD candidate, Statistics.

Thursday,  Dec 5: Guest lecture: Jan Vitek, Professor, Computer Science. Lecture notes.

                            Hw9 due: group A, group B.

Monday, Dec 6: Project 2 evaluations due.

Tentative schedule and handouts