STAT 598C

Statistical Methods For Bioinformatics and Computational Biology

Fall 2013

TTh 10:30 - 11:45am, REC 307

STAT 598C

Statistical Methods For Bioinformatics and Computational Biology

Fall 2013

TTh 10:30 - 11:45am, REC 307

Instructor: Olga Vitek

Office: HAAS 120

Phone: (765) 496-9544

Email: ovitek@stat.purdue.edu

Office hours: Tue 11:45am-12:45pm, Wed 9:00am-10:00am or by appointment

Admin: Syllabus, Piazza, Blackboard.

R: CRAN, reference, search. RStudio. Bioconductor. Purdue software consulting.

Books: BioC case studies. Stats with microarrays. BioC book. Stats with R. Elements of learning.

Introduction

Bio: Terms, concepts and examples.

Stat: Reproducible computational research.

Tuesday, Aug 20: Lecture notes.

Thursday, Aug 22: R examples: .R, .Rmd, .Rnw, .csv. Hw1 out.

Reading: Organizing projects, reproducible research, Bioconductor, R intro.

Gene expression & protein abundance: continuous data

Bio: Measurement technology. Typical applications.

Stat: Signal processing. Normalization.

Tuesday, Aug 27: Lecture notes.

Thursday, Aug 29: Lecture notes. R examples: signal processing, expressionSet, data structures

Hw1 due: solutions, .Rnw. Hw2 out: problems 1-3,

Problem 4: NMR expressionSet, .csv files 1, 2 and 3.

Reading: BioC case studies Ch2, 3, 5; BioC book Ch3. Spike-in evaluation 1 and 2.

Gene expression: count data

Bio: Measurement technology. Typical applications.

Stat: Signal processing. Normalization.

Tuesday, Sep 3: Lecture notes. Extra slides.

Thursday, Sep 5: Guest lecture: Nadia Atallah, PhD candidate, Botany and Plant Pathology. Slides.

Project 1 groups due. Hw2 due. Hw3 out.

Reading: Sequencing: technology 1, 2, methods 1, 2, 3, filtering. WikiBook. Community.

Transcriptome quantification: overview, methods 1, 2, protocol, biological variation.

Finding differential abundance: continuous measurements

Bio: Testing with continuous measurements.

Stat: Classical and Empirical Bayes models.

Tuesday, Sep 10: Lecture notes. Project 1 guidelines.

Thursday, Sep 12: R examples. Hw3 due. Hw4 out: problems, dataset.

Reading: Draghici Ch8-14, 21; BioC case studies Ch5, 6 and 7; BioC book Ch23. Hierarchical models.

Of interest: Limma 1, 2. SAM paper and url.

Finding differential abundance: RNA-seq

Bio: Testing with categorical measurements.

Stat: Classical and Empirical Bayes models.

Tuesday, Sep 17:

Thursday, Sep 19: Project 1 proposals due. Lecture notes. Hw4 due.

Tuesday, Sep 24: R examples. Hw5 out: problems, dataset.

Reading: Overview, datasets description and link.

Of interest: edgeR 1, 2, 3, 4, 5, DESeq 1, 2, sSeq, SAMseq.

Multiple comparisons

Bio: Multiplicity of testing and multivariate error rates.

Stat: P-value filtering. Resampling-based methods. Two-group models.

Thursday, Sep 26:

Tuesday, Oct 1:

Thursday, Oct 3: Lecture notes. R examples. Hw5 due.

Reading: Draghici Ch16, BioC case studies Ch7. Primers 1, 2.

Of interest: Non-specific filtering. FDR 1, 2, 3, 4, 5. Resampling.

Planning new experiments

Bio: More on reproducible research.

Stat: Allocation of experimental resources. Sample size calculation.

Tuesday, Oct 8: No class, no office hours. October break.

Thursday, Oct 10: Project 1 due. Lecture notes.

Tuesday, Oct 15: Hw6 out. Project 2 guidelines.

Thursday, Oct 17:

Reading: Draghici Ch15.

Of interest: Reproducibility: Bias. Reproducibility challenges general 1,2,3; microarrays, proteomics.

Statistical experimental design: microarrays; RNA-seq 1,2; proteomics. Sample size: 1, 2.

Unsupervised data exploration

Bio: Data visualization.

Stat: Principle component analysis, clustering and biclustering. Feature selection.

Tuesday, Oct 22: Lecture notes.

Thursday, Oct 24: Project 2 groups due. R examples. Hw6 due. Hw7 out.

Tuesday, Oct 29:

Reading: Draghici Ch17.9, 18. BioC book Ch12. Data visualization 1, 2. Primers 1, 2. Reviews 1, 2, 3.

Of interest: Number of clusters 1, 2. Discover associations. Select features. Account for noise.

Module maps in cancer, microarrays and imaging, mass spectrometry and imaging.

Supervised classification

Bio: Discovery of biomarkers of disease.

Stat: Supervised classification. Feature selection. Measures of performance.

Thursday, Oct 31: Lecture notes. Hw7 due.

Tuesday, Nov 5: Project 2 proposals due.

Thursday, Nov 7: Hw8 out. Case studies.

Tuesday, Nov 12: Lecture notes. R examples.

Reading: Draghici Ch29. Primer 1, reviews 1, 2, 3.

Of interest: Regularized methods 1, 2, 3, 4.

MAQC II. Methods guidelines 1, 2, 3. Example studies 1, 2, 3, 4, 5.

Biomolecular networks: exploration and inference

Bio: Pathway analysis vs network inference.

Stat: Statistical modeling.

Thursday, Nov 14:

Tuesday, Nov 19: Lecture notes. Hw8 due.

Thursday, Nov 21: Lecture notes. R examples 1 and 2, and datasets 1, 2, 3 and 4.

Hw9 out: group A, group B. Datasets 1, 2 and 3.

Tuesday, Nov 26: Guest lecture: Robert Ness, PhD candidate, Statistics.

Thursday, Nov 28: No class, no office hours. Thanksgiving break.

Of interest: Importance of biological networks 1, 2, 3, 4, 5. Practical steps of data integration: 1, 2, 3, 4.

Pathways analysis: GO, gene-set enrichment 1, 2.

Network inference: Reviews 1, 2, 3. Example studies 1,2.

Graphical models 1, 2, 3. Regularized methods 1.

Topics in R-based statistical computing

R data structures. Data manipulation. Debugging.

Monday, Dec 2: Project 2 due.

Tuesday, Dec 3: Guest lecture: Robert Ness, PhD candidate, Statistics.

Thursday, Dec 5: Guest lecture: Jan Vitek, Professor, Computer Science. Lecture notes.

Monday, Dec 6: Project 2 evaluations due.

Tentative schedule and handouts