Title: Null Model Methods for Cluster Analysis of Gene Expression Data
Speaker: Dr. Brian Munneke
Place: LAEB 2280; October 30, 2001, Tuesday, 4:30pm


Recent microarray innovations in molecular biology allow the biologist to quantitate the amount of gene activity resulting from various experimental conditions. Experiments employing microarrays can be used to monitor thousands of genes simultaneously across several treatment conditions. The vast amount of data coupled with the inherent variation present in the microarray technology provides many opportunities for statisticians to contribute to both the analysis and interpretation of these data. A common approach for exploratory analysis of these data is cluster analysis, where the intention of the biologist is to uncover regulatory relationships between genes. The experimental variation present in microarray technology causes concern among researchers and calls into question the stability of clusters derived from the gene expression profiles, as well as the inferred co-regulatory relationships. Motivated by the need to investigate the statistical implications of this variation, a methodology for statistically validating cluster structure will be presented. This investigation includes a new dissimilarity measure, based on a penalized cosine of the angle, used in clustering algorithms, and two approaches that independently produce the null space representations of the gene expression profiles. Incorporation of these null model methodologies with cluster analysis of gene expression data provides an assessment of how close any suggested cluster group structure is to what would be expected by random association.