Title: IMPROVED STATISTICAL INFERENCE FROM DNA MICROARRAY DATA USING ANALYSIS OF VARIANCE AND A BAYESIAN STATISTICAL FRAMEWORK

Abstract

Most current approaches to studying gene expression using DNA microarrays do not employ statistical tests to determine the reliability of observed changes in gene expression. We describe statistical tests, based on the t-test, which can be conveniently used on high-density array data to test for statistically significant differences between treatments. These t-tests employ either the observed variance among replicates within treatments or a Bayesian estimate of the variance among replicates within treatments based on a prior estimate obtained from a local estimate of the standard deviation. The Bayesian prior allows statistical inference to be made from microarray data even when experiments are only replicated at nominal levels. We apply these new statistical tests to a data set which examined changes in gene expression in IHF+ and IHF- E. coli cells and identify a more biologically reasonable set of candidate genes than those identified using only fold-change or statistical tests not incorporating a Bayesian prior. We also show that using statistical tests based on analysis of variance and a Bayesian prior identifies genes that are up- or down- regulated following an experimental manipulation more reliably than approaches based only on fold-change. All the described tests are implemented in a simple-to-use Web interface called Cyber-T. Cyber-T is located at http://www.genomics.uci.edu/software.html.