Wednesday, November 18, 2009
03:30 PM in REC 315
Professor Lingsong Zhang
Department of Statistics, Purdue University

Sparse Linear Discriminant Analysis Method for Genetic Pathways

Abstract

An increasing challenge in analysis of genomic data is how to interpret and gain biological insight of profiles of thousands of genes. There is an increasing interest in analysis of genomic data by incorporating prior biological knowledge using gene sets and genomic pathways, which consist of groups of biologically similar genes. Such approaches allow one to study the joint effects of a group of genes. Existing methods include over-representation analysis, gene set enrichment analysis, principal component analysis, global test, and kernel machine. However, these pathway analysis methods do not provide a selection of important genes in the pathway and the analysis can be dominated by the noises of noninformative genes. We propose sparse linear discriminant analysis (SLDA) for genetic pathway data, which allow us to study the joint effects of genes within a pathway while selecting important genes that drive the differences. We provide an efficient path algorithm to obtain the solution. We illustrate these methods by application to a type II diabetes data set and a metal fuse exposure data set.

Related Paper:

Wu, M. C., Zhang, L., Wang, Z., Christiani D. C. and Lin, X. (2009), "Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection", Bioinformatics, 25(9), pp. 1145-1151.