Title: Statistical Analysis and Experimental Design for Gene Expression Microarrays

Abstract

Spotted cDNA microarrays are emerging as a powerful and cost-effective technology for large scale analysis of gene expression. Using this technology, it is possible to measure the relative expression levels of thousands of genes from two more more tissue samples. When analyzing the results from microarray experiments, there are a number of sources of variation that must be accounted for. These include slide to slide variation, differences in sample prep, and, at the lowest level, measurement error. In addition, there are constraints on the design of array experiments. Current technology relies on a two-dye signally system that means there is a limit of two samples per array. Since substantial variation can occur between arrays, arrays must be treated as a blocking factor and the resulting designs have an incomplete block structure. We have been investigating the properties of experimental designs that allow scientists to carry out classical analysis of variance on microarray data. These designed experiments offer quality control, normalization, and a method of analysis that takes multiple sources of variation into account. The end results are estimates of changes in gene expression levels with error bars. In addition, we are studying how these estimates and their confidence intervals can be used with a clustering algorithm to incorporate rigorous statistical inference into such higher-order analyses.