Wednesday, October 21, 2009
03:30 PM in REC 315
Professor Jun Xie
Department of Statistics, Purdue University

Statistical Challenges in Analysis of Large Scale SNP Data and Gene Expression Data

Abstract

Despite progresses in statistical analyses of genomic data, more specifically SNP and gene expression data, many statistical challenges in these data sets are unsolved. SNPs are single base differences in DNA sequence among individuals. The data type is categorical, with three possible genotypes in a single SNP. But when we consider a block of 10 SNPs, there are about 60,000 categories, much larger than a typical sample size. Besides this difficulty, multiple testing is always an issue. I will present some preliminary analysis for large scale SNP data and introduce a new concept of hypothesis testing motivated by Dempster-Shafer theory for inference. I will also mention another data set of gene expression, with a challenging goal of classifying patients' responses to a drug.

This is joint work with Professor Chuanhai Liu in the Department of Statistics.