Nonparametric Bayes: Big Models for Big Data - Department of Statistics - Purdue University Skip to main content

Nonparametric Bayes: Big Models for Big Data

Organizer and Chair: Vinayak Rao, Assistant Professor of Statistics, Department of Statistics, Purdue University

Speakers

  • Peter Mueller, Professor, Department of Mathematics and Department of Statistics and Data Science, Chair (ad interim), Department of Statistics and Data Sciences, University of Texas at Austin
  • XuanLong (Long) Nguyen, Associate Professor and Director of Master's Programs, Department of Statistics, University of Michigan
  • Sinead Williamson, Assistant Professor, Department of Information, Risk and Operations Management, and Department of Statistics and Data Sciences, University of Texas at Austin
  • Steven MacEachern, Professor of Statistics and Department Chair, Department of Statistics, The Ohio State University
Schedule

Thursday, June 7, 10:00 a.m.-12:00 p.m. in STEW 214 CD

Time Speaker Title
10:00-10:30 a.m. Peter Mueller Scalable Bayesian Nonparametric Clustering and Classification for EHR data
Abstract: We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is embarrassingly parallel and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets.
We apply the approach to inference under a product partition model with regression on covariates. We show results for inference with a large set of Chinese electronic health records (EHR). We find interesting clusters and favorable classification performance relative to other widely used competing classifiers.
10:30-11:00 a.m. Long Nguyen Streaming dynamic and distributed inference of latent geometric structures
Abstract: We develop new models and algorithms for learning the temporal dynamics of the topic polytopes and related geometric objects that arise in topic model based inference. Our model is nonparametric Bayesian and the corresponding inference algorithm is able to discover new topics as the time progresses. By exploiting the connection between the modeling of topic polytope evolution, Beta-Bernoulli process and the Hungarian matching algorithm, our method is shown to be several orders of magnitude faster than existing topic modeling approaches, as demonstrated by experiments working with several million documents in a dozen minutes.
11:00 - 11:30 a.m. Sinead Williamson Nonparametric models for structured sparse graphs
Abstract: There has been recent interest in the Bayesian community in models for sparse graphs with an unbounded number of vertices. Such models are appropriate for modeling large social or interaction networks, where the number of vertices scales approximately linearly with the number of interactions. However, sparsity is only one aspect of the structure of such networks, and naive sparse models tend to ignore the presence of locally dense sub-graphs and latent communities. We propose models appropriate for binary and integer-valued graphs that are globally sparse, but which contain locally dense sub-graphs, and show how these models can be used to infer latent communities from social network data.
11:30 a.m. -12:00 p.m. Steve MacEachern

Aggregated Pairwise Classification of Shapes

Abstract: The classification of shapes is of interest in areas ranging from medical imaging to computer vision and beyond. While many statistical frameworks have been developed for the classification problem, most are strongly tied to early formulations of the problem--with an object to be classified described as a vector in a relatively low-dimensional Euclidean space. Shape data have two main properties that suggest the need for a novel approach: shapes are inherently infinite dimensional with strong dependence among the position of nearby points, and shape space is not Euclidean, but is curved. We first investigate a standard method whereby the shapes are projected from shape space to Euclidean space, dimension is reduced through principal components, and standard methods are applied. From this baseline, we investigate the role of the projection in classification. We demonstrate improved performance by creating effective pairwise classifiers and then aggregating the results for a full classification. The method is illustrated on a well-known data set consisting of shapes of leaves.
This is joint work with Min Ho Cho and Sebastian Kurtek.

Purdue Department of Statistics, 150 N. University St, West Lafayette, IN 47907

Phone: (765) 494-6030, Fax: (765) 494-0558

© 2023 Purdue University | An equal access/equal opportunity university | Copyright Complaints

Trouble with this page? Disability-related accessibility issue? Please contact the College of Science.