Title: "A Weighted FDR Procedure for Discrete Data"
Speaker: Xiongzhi (Chee) Chen, Princeton University, Lewis-Sigler Institute

Place: (NEW LOCATION!!! )MSEE B012
Date: Feb 3, 2015; Tuesday
Time: 4:30pm


With the use of next-generation sequencing (NGS) technologies, measurements for biological entities under study, such as expressions of genes, are represented by counts on discrete scales instead of relative light intensities on a continuous scale. The key feature for multiple testing (MT) based on discrete data is that the p-value distributions under the null hypothesis are discrete and different from each other. This is in sharp contrast to MT based on continuous data where the null p-value distributions are identical and uniform. It is widely acknowledged that popular multiple testing procedures lose power when they are applied to discrete data since these procedures were originally designed for MT based on continuous measurements. To resolve this issue, we propose a weighted false discovery rate (FDR) procedure that directly adjusts for the discreteness of p-value distributions, groups statistical evidence of similar strength, weights their associated p-values accordingly, and then conducts multiple testing. Through simulation studies, we show that the new procedure is much more powerful than the procedures of Benjamini and Hochberg and of John D. Storey. The new procedure is applied to two RNA-Seq count data sets for Arabidopsis thaliana respectively to assess differential expression for genes and differential methylation for cytosines.

Associated reading:
[1] Xiongzhi Chen and R.W. Doerge (2014): Generalized estimators for multiple testing: proportion of true nulls and false discovery rate. Under review by "Journal of the Royal Statistical Society, Series B"; available at http://arxiv.org/abs/1410.4274

[2] Xiongzhi Chen and R.W. Doerge (2014): A weighted FDR procedure under discrete and heterogeneous null distributions. Manuscript.

Click here www.stat.purdue.edu/~doerge/BIOINFORM.D/SPRING15/sem.html for a full schedule of BIOINFORMATICS SEMINARS, past and present.