Scalable Bayesian Methods for Large and Complex Data - Department of Statistics - Purdue University Skip to main content

Scalable Bayesian Methods for Large and Complex Data

Organizer and Chair: Anindya Bhadra, Assistant Professor of Statistics, Department of Statistics, Purdue University

Speakers

  • Jeffrey Morris, Professor, Department of Biostatistics, The University of Texas MD Anderson Cancer Center
  • Naveen N. Narisetty, Assistant Professor of Statistics, Department of Statistics, University of Illinois at Urbana-Champaign
  • Veronika Rockova, Assistant Professor in Econometrics and Statistics, James S. Kemper Foundation Faculty Scholar, University of Chicago Booth School of Business

Schedule

Thursday, June 7, 1:30-3:30 p.m. in STEW 214 CD

Time Speaker Title
1:30-2:00 p.m. Jeffrey Morris Bayesian Semiparametric Functional Mixed Models for Serially Correlated Functional Data, with Application to Glaucoma Data

Abstract: Glaucoma, a leading cause of blindness, is characterized by optic nerve damage related to intraocular pressure (IOP), but its full etiology is unknown. Researchers at UAB have devised a custom device to measure scleral strain continuously around the eye under fixed levels of IOP, which here is used to assess how strain varies around the posterior pole, with IOP, and across glaucoma risk factors such as age. The hypothesis is that scleral strain decreases with age, which could alter biomechanics of the optic nerve head and cause damage that could eventually lead to glaucoma. To evaluate this hypothesis, we adapted Bayesian Functional Mixed Models to model these complex data consisting of correlated functions on spherical scleral surface, with nonparametric age effects allowed to vary in magnitude and smoothness across the scleral surface, multi-level random effect functions to capture within-subject correlation, and functional growth curve terms to capture serial correlation across IOPs that can vary around the scleral surface. Our method yields fully Bayesian inference on the scleral surface or any aggregation or transformation thereof, and reveals interesting insights into the biomechanical etiology of glaucoma. The general modeling framework described is very flexible and applicable to many complex, high-dimensional functional data, and currently general Matlab/R package being assembled to fit a broad array of Bayesian Functional Mixed Models that have appeared in the literature. This work is to appear in JASA-ACS.

2:00-2:30 p.m. Naveen Narisetty Scalable Bayesian approaches for quantile regression under censoring

Abstract: Quantile regression provides a more comprehensive relationship between a response and covariates of interest compared to mean regression and is especially advantageous for censored data as it substantially generalizes classical survival models such as the AFT model. In this talk, I will first discuss a new Bayesian approach for censored quantile regression that can handle high dimensional covariates. Our approach uses continuous spike and slab priors with sample size dependent parameters to induce adaptive shrinkage and sparsity. A scalable Gibbs sampling algorithm for posterior computation will be presented, which has desired theoretical properties. I will also briefly describe a new data augmentation method for estimation in censored quantile regression that can handle arbitrary types of censoring.

2:30-3:00 p.m. Veronika Rockova Posterior concentration for Bayesian regression trees and their ensembles

Abstract: Since their inception in the 1980's, regression trees have been one of the more widely used non-parametric prediction methods. Tree-structured methods yield a histogram reconstruction of the regression surface, where the bins correspond to terminal nodes of recursive partitioning. Trees are powerful, yet susceptible to over-fitting. Strategies against overfitting have traditionally relied on pruning greedily grown trees. The Bayesian framework offers an alternative remedy against overfitting through priors. Roughly speaking, a good prior charges smaller trees where overfitting does not occur. While the consistency of random histograms, trees and their ensembles has been studied quite extensively, the theoretical understanding of the Bayesian counterparts has been missing. In this paper, we take a step towards understanding why/when do Bayesian trees and their ensembles not overfit. To address this question, we study the speed at which the posterior concentrates around the true smooth regression function. We propose a spike-and-tree variant of the popular Bayesian CART prior and establish new theoretical results showing that regression trees (and their ensembles) (a) are capable of recovering smooth regression surfaces, achieving optimal rates up to a log factor, (b) can adapt to the unknown level of smoothness and (c) can perform effective dimension reduction when p > n. These results provide a piece of missing theoretical evidence explaining why Bayesian trees (and additive variants thereof) have worked so well in practice. (joint with Stephanie van der Pas)

3:00-3:30 p.m. Questions and Discussion

Purdue Department of Statistics, 150 N. University St, West Lafayette, IN 47907

Phone: (765) 494-6030, Fax: (765) 494-0558

© 2023 Purdue University | An equal access/equal opportunity university | Copyright Complaints

Trouble with this page? Disability-related accessibility issue? Please contact the College of Science.