Title: "Sparse method for unsupervised learning in high dimensions, with applications to omics data"
Speaker: Daniela Witten, Department of Biostatistics, University of Washington, Seattle, WA
Place: HORT 117; September 27, 2011, Tuesday, 4:30pm

Abstract

In recent years, very large data sets have become commonplace in a variety of fields, from finance to marketing to genomics. Such "high-dimensional" data sets are often characterized by having many more variables (e.g. SNPs or gene expression measurements) than observations (e.g. tissue samples). Given a high-dimensional data set, one may wish to perform a supervised analysis (in which we seek to predict some type of outcome, such as survival time or tumor subtype) or an unsupervised analysis (in which we want to identify signal in the data, but there is no outcome for us to predict). Unsupervised analyses are particularly challenging, and are of growing importance as the goal of many experiments moves towards hypothesis generation rather than hypothesis testing. In this talk, I will discuss some recent developments in statistical methods for unsupervised analyses of high-dimensional data sets. I will discuss methods for clustering large data sets, approaches for performing integrative analyses of data sets consisting of a single set of patient samples for which multiple omic data types have been collected, as well as techniques for estimating networks based on very large data sets. These approaches will be illustrated on DNA copy number and gene expression data sets. This is joint work with Robert Tibshirani, Jerry Friedman, Pei Wang, and others.

Associated Reading
Witten DM and R Tibshirani (2010) A framework for feature selection in clustering. Journal of the American Statistical Association 105(490): 713-726.

Friedman J, Hastie T, and R Tibshirani (2007) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9: 432-441.

Click here www.stat.purdue.edu/~doerge/BIOINFORM.D/FALL11/sem.html for a full schedule of BIOINFORMATICS SEMINARS, past and present.