Title: "Sparse Integrative Clustering of Multiple "Omic" Data Sets"
Speaker: Sijian Wang; Department of Biostatistics, University of Wisconsin, Madision.
Place: LILLY G126; November 2, 2010, Tuesday, 4:30pm


High resolution microarray and next-generation sequencing platforms are powerful tools to investigate genome-wide alterations in gene expression, DNA copy number, DNA methylation, and other genomic events associated with a disease. An integrated genomic profiling approach measuring these "omic" data types simultaneously in the same set of samples would further reveal disease mechanisms that would not be otherwise detectable with a single data type. In this talk, I will present a joint data analysis approach for subtype discovery and associated biomarkers. Building upon the connection between principal component analysis (PCA), K-means clustering, and Gaussian mixture model, we formulated a novel integrative clustering method for integrating "omic" data sets. Sparse solutions are derived using regularization methods exploiting genomic data structures to yield "grouping" effects for highly correlated features and to impose smoothness along chromosomal positions. I will discuss results from analyzing data from the Cancer Genome Atlas (TCGA) project.

Associated Reading:

Shen R, Olshen AB, Ladanyi M. Integrative Clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 2009; 25(22): 2906-12.

Click here for a full schedule of BIOINFORMATICS SEMINARS, past and present.