Session 15 (Part 1) - Department of Statistics - Purdue University Skip to main content

Interactions Between Omics and Statistics: Analyzing High Dimensional Data

Speaker(s)

  • Dennis Cook (University of Minnesota)
  • Michael Kosorok (University of North Carolina)
  • Marten Wegkamp (Cornell University)
  • Ping Ma (University of Illinois, Urbana-Champaign)

Description

With the recent development in high throughput technologies, more and more high-dimensional data are being generated in animal, plant, and human studies. These data are presenting new challenges to researchers in statistics that require computationally efficient novel approaches. The session will be organized as an interactive frontier for prestigious researchers within statistics and other disciplines, for the purpose of gaining insight into addressing the issues associate with high-dimensional data analysis in a variety of real-world applications. 

Schedule

Sat, June 23 - Location: STEW 310

TimeSpeakerTitle
1:30 - 2:15PM Dennis Cook Dimension Reduction in Abundant High-dimensional Regressions
[PDF Slides]  Abstract: We will discuss the asymptotic behavior of a new class of methods for dimension reduction in high-dimensional regressions, as the sample size and number of predictors grow in various alignments. These methods, which are based on inverse regressions of the predictors on vector-valued functions of the response, give consistent predictor reductions in a variety of settings, particularly in abundant regressions where most predictors contribute some information on the response. Oracle rates are possible. Potential application in a systems biology approach to optimizing tissue growth in vitro will be mentioned and an example will be given to illustrate the theoretical conclusions.
2:15 - 3:00PM Michael Kosorok Personalized Medicine and Statistical Learning
[PDF Slides]  Abstract: Personalized medicine is an important and active area of clinical research involving high dimensional data. In this talk, we describe some recent design and methodological developments in clinical trials for discovery and evaluation of personalized medicine. Statistical learning tools from artificial intelligence, including machine learning, reinforcement learning and several newer learning methods, are beginning to play increasingly important roles in these areas. We present illustrative examples in treatment of depression and cancer. The new approaches have significant potential to improve health and well being.
3:00-3:30PM Break
3:30 - 4:15PM Marten Wegkamp Joint variable and rank selection for parsimonious estimation of high dimensional matrices
[PDF Slides] Abstract: This talk is devoted to optimal dimension reduction methods for sparse, high dimensional multivariate response regression models. Both the number of responses and that of the predictors may exceed the sample size. Sometimes viewed as complementary, predictor selection and rank reduction are the most popular strategies for obtaining lower dimensional approximations of the parameter matrix in such models. We show that important gains in prediction accuracy can be obtained by considering them jointly. For this, we first motivate a new class of sparse multivariate regression models, in which the coefficient matrix has both low rank and zero rows or can be well approximated by such a matrix. Then, we introduce estimators that are based on penalized least squares, with novel penalties that impose simultaneous row and rank restrictions on the coefficient matrix. We prove that these estimators indeed adapt to the unknown matrix sparsity and have fast rates of convergence. Our theoretical results are supported by a simulation study.  This is joint work with Florentina Bunea and Yiyuan She.
4:15 - 5:00PM Ping Ma Nonparametric modeling of regulatory network
[PDF Slides]  Abstract: Discovering which regulatory proteins, especially transcription factors (TFs), are active under certain experimental conditions and identifying the corresponding binding motifs is essential for understanding the regulatory circuits that control cellular programs. The experimental methods used for this purpose are laborious. Computational methods have been proven extremely effective in identifying TF-binding motifs and constructing regulatory network.

In this talk, I will present a nonparametric method for elucidating the regulatory network. The problem is formulated as a special case of penalized likelihood conditional density estimation on a generic domain. The modeling tools include the Bayesian confidence intervals for odds ratios among margins of tables and the mixed-effect models for correlated gene expression. I will also discuss cross-validation for smoothing parameter selection and "hypothesis testing" via Kullback-Leibler projection. Real data example will be presented to demonstrate the performance of the proposed method.

Purdue Department of Statistics, 150 N. University St, West Lafayette, IN 47907

Phone: (765) 494-6030, Fax: (765) 494-0558

© 2023 Purdue University | An equal access/equal opportunity university | Copyright Complaints

Trouble with this page? Disability-related accessibility issue? Please contact the College of Science.