Title: "Biological network and disease risk factor discovery by regularized analysis of genome-wide data"
Speaker: Jason Mezey, Department of Biological Statistics and Computational Biology, Cornell University and Department of Genetic Medicine, Weill Cornell Medical College
Place: HORT 117; October 18, 2011, Tuesday, 4:30pm


Genome-wide genotype data and cellular gene expression measurements in population samples contain information that can be mined to identify gene risk factors for complex diseases and can be leveraged to discover novel regulatory network relationships. The challenge when analyzing genome-wide data is the identification of relevant factors when the sample size is small compared to the number of measured genomic features, which with next-generation sequencing technology can be in the tens of thousands for gene expression traits and in the millions for genetic markers. To meet this challenge, we have developed algorithms that make use of probabilistic graphical models to assess relationships among genotype, gene expression, and disease phenotypes and a variety of regularization approaches, spanning both commonly used forms such as lasso and under-used forms such as MCP and mixture types, to optimally identify the small number of relevant genotype-disease associations and putative regulatory relationships among genes when analyzing large genomic data sets. As an example, we have implemented a highly scalable algorithm that can simultaneous analyze tens of thousands of genetic markers in genome-wide association studies (GWAS) in a few hours on a standard desktop and we have used this method to identify novel associations for Crohn’s disease, type 2 diabetes, and bipolar disorder by re-analyzing existing GWAS data. As another, we have implemented a sub-local constraint algorithm for directed network analysis and a variational Bayes algorithm for undirected network analysis, where we have used these methods to identify putative regulatory relationships important for gene expression levels in the human lung and, in a separate study, with effects on obesity and weight related phenotypes.

Associated Reading:
Benjamin A. Logsdoni and Jason Mezey. 2010. Gene Expression Network Reconstruction by Convex Feature Selection when Incorporating Genetic Perturbations. PLoS Computational Biology.

Click here www.stat.purdue.edu/~doerge/BIOINFORM.D/FALL11/sem.html for a full schedule of BIOINFORMATICS SEMINARS, past and present.