Session 019 - Department of Statistics - Purdue University Skip to main content

Modern Developments in Bayesian Nonparametrics

Organizer: Vinayak Rao, Associate Professor of Statistics, Associate Professor of Computer Science (Courtesy), Purdue University

Speakers

  • Surya Tokdar, Professor of Statistical Science, Duke University
  • Sanvesh Srivastava, Associate Professor,Director of Undergraduate Studies, Data Science and Statistics, University of Iowa
  • Inyoung Kim, Associate Professor, Department of Statistics, Virginia Tech
  • Francesco Gaffi, PhD Candidate, Department of Decision Sciences, Bocconi University

Speaker Title
Surya Tokdar Bayes in the Extremes

Abstract: Statistical analyses of heavy tailed data bring in a unique set of questions. Often the scientific focus shifts to the tails of the distribution, e.g., to forecasting the 100-year daily precipitation or to identifying predictors which influence extreme low birthweight. Parametric models, whose fit is largely dictated by the central bulk of the data, may not do justice to capturing tail structures. At the same time, purely nonparametric approaches may prove futile in effectively smoothing information from sparse observations across an elongated tail. Toward more effective statistical analyses of heavy tailed data, I will introduce a class of semiparametric Bayesian methods for density estimation and quantile regression. With a carefully chosen nonparametric prior distribution, the density estimation method will be shown to simultaneously guarantee accurate estimation of the density function and its tail index, both at near optimal minimax rate. The related quantile regression methodology will be shown to offer a powerful and yet interpretable generalization of standard linear regression to what one might call a Quantile Linear Model. The QLM, complete with an identification of residual noise, gives a model-based idealization of quantile regression retaining the ability to quantify differential predictor influence on the tails while simultaneously adjusting for noise correlation. I will discuss how QLM leads to a comprehensive inferential framework with the added qualities of model fit assessment and model selection.

Sanvesh Srivastava Distributed Bayesian Varying Coefficient Modeling Using a Gaussian Process Prior

Abstract: We use the divide-and-conquer technique to address inefficient inference in Varying coefficient models (VCMs) based on Gaussian process (GP) priors. Our proposal has three steps. The first step creates many data subsets with much smaller sample sizes by sampling without replacement from the full data. The second step formulates VCM as a linear mixed-effects model and develops a data augmentation (DA)-type algorithm for obtaining MCMC draws of the parameters and predictions on all the subsets in parallel. The DA-type algorithm appropriately modifies the likelihood such that every subset posterior distribution accurately approximates the corresponding true posterior distribution. The third step develops a combination algorithm for aggregating MCMC-based estimates of the subset posterior distributions into a single posterior distribution called the Aggregated Monte Carlo (AMC) posterior. The AMC posterior has minimax optimal posterior convergence rates in estimating the varying coefficients and the mean regression function. Joint work with Rajarshi Guhaniyogi, Cheng Li, and Terrance Savitsky

Inyoung Kim Nonparametric Bayesian Function Clustering using Weighted Dirichlet Process Mixture
Abstract: To study the differences between two types of triple-negative breast cancer cell lines at the molecular level, we performed function cluster analyses and vibrational peak point selection methods on the massive nonlinear curves of signals versus wavenumbers. In this talk, we propose a nonparametric Bayesian function clustering and peak point selection method via weighted Dirichlet process mixture (WDPM) modeling that automatically clusters and provides accurate estimations, together with conditional
Laplace prior, which is a conjugate variable selection prior. The proposed method is named WDPM-VS for short, and it greatly outperforms its comparison methods in root mean squared errors. Based on this proposed method, we identified essential wavenumbers that can explain the racial disparities.
Francesco Gaffi Partially exchangeable stochastic block models for multilayer networks.

Abstract: There is an increasing availability of multilayer network data that either encode information on multiple types of edges among the same set of nodes (edge-colored networks) or characterize a single notion of connectivity between nodes belonging to different pre-specified layers (node-colored networks). In analyzing such data, an overarching focus of current research has been on extending classical models for a single-layered network to account for multilayer information, with a particular interest in inferring block structures among groups of nodes having similar connectivity patterns. While recent advancements along these directions have been made in the context of edge-colored networks, there is still a lack of state-of-the-art stochastic block models for node-colored multilayer networks which can flexibly account for both within and across layer block-connectivity structures, while incorporating layer information in a principled probabilistic manner. This hinders methodological and theoretical progress in the field, while preventing the development of rigorous inference and prediction strategies. In this work, we cover such a gap by proposing a new class of partially exchangeable stochastic block models that relies on a hierarchical random partition prior for the group allocations of nodes driven by the urn scheme of a hierarchical normalized completely random measure. The partial exchangeability assumption among nodes according to layer partitions allows to infer both within- and across-layer blocks, while crucially preserving probabilistic coherence, principled uncertainty quantification and formal inclusion of prior information from the layer division. The mathematical tractability and projectivity of the proposed construction further allows to analytically derive predictive within- and across-layer co-clustering probabilities, thus facilitating prior elicitation, understanding of theoretical properties and development of rigorous predictive strategies for both the connections and allocations of future incoming nodes. Posterior inference proceeds via a tractable collapsed Gibbs-sampler. The practical performance of this novel class is illustrated in simulation studies and in a real-world criminal network application, where the proposed model displays clear gains, relative to alternative solutions, in estimation, uncertainty quantification and prediction.

Joint work with Daniele Durante, Antonio Lijoi and Igor Prünster.

Purdue Department of Statistics, 150 N. University St, West Lafayette, IN 47907

Phone: (765) 494-6030, Fax: (765) 494-0558

© 2023 Purdue University | An equal access/equal opportunity university | Copyright Complaints

Trouble with this page? Disability-related accessibility issue? Please contact the College of Science.