Session 7 - Department of Statistics - Purdue University Skip to main content

(i) Recent Advances in Semiparametric Theory and Methods; (ii) High-Dimensional Data Modeling

Speaker(s)

  • Ingrid Van Keilegom (Université Catholique de Louvain)
  • Bin Nan (University of Michigan)
  • Zuofeng Shang (University of Notre Dame)
  • Xiaotong Shen (University of Minnesota)
  • Cun-hui Zhang (Rutgers University)
  • Hao Helen Zhang (North Carolina State University)

Description

Rapid advances in modern computers and technologies have created unprecedented data explosion in today's scientific research. Data abundance is both a blessing and a curse. On one hand, various types of massive or high-dimensional data contain greatly enriched information for knowledge discovery, and therefore provide us enormous opportunities to explore data at a much deeper level. On the other hand, the extreme magnitude and scale of modern data pose serious challenges in extracting useful information from raw data and doing translational science. More than ever before, more sophisticated statistical models/theories are needed to analyze and understand data. Our two sessions are proposed to meet these urgent needs. The first session focuses on recent theoretical and methodological development of general semiparametric models for exploring massive and complex data. Our second session focuses on joint variable screening and structure selection in ultra-high dimensional data. 

Schedule

Fri, June 22 - Location: STEW 314

TimeSpeakerTitle
1:30 - 2:00PM Ingrid Van Keilegom Boundary estimation in the presence of measurement error with unknown variance

Abstract: Boundary estimation appears naturally in economics in the context of productivity analysis. The performance of a firm is measured by the distance between its achieved output level (quantity of goods produced) and an optimal production frontier which is the locus of the maximal achievable output given the level of the inputs (labor, energy, capital, etc.). Frontier estimation becomes difficult if the outputs are measured with noise and most approaches rely on restrictive parametric assumptions. This paper contributes to the direction of nonparametric approaches.

We consider a general setup with unknown frontier and unknown variance of a normally distributed error term, and we propose a nonparametric method which allows to identify and estimate both quantities simultaneously. The asymptotic consistency and the rate of convergence of our estimators are established, and simulations are carried out to verify the performance of the estimators for small samples. We also apply our method on a dataset concerning the production output of American electricity utility companies. (This is joint work with Alois Kneip and Léopold Simar.) 

2:00 - 2:30 Bin Nan Semiparametric Models with Bundled Parameters

Abstract: In many semiparametric models that are parameterized by two types of parameters — a Euclidean parameter of interest and an infinite dimensional nuisance parameter, the two parameters are bundled together, i.e., the nuisance parameter is an unknown function that contains the parameter of interest as part of its argument. For example, in a linear regression model with censored survival data, the unspecified error distribution function involves the regression coefficients. Motivated by developing an efficient estimating method for the regression parameters, we consider the sieve maximum likelihood estimation and propose a general M-theorem for such bundled parameters. The numerical implementation of the proposed estimating method can be achieved through the conventional gradient-based search algorithms such as the Newton-Raphson algorithm. We show that the proposed estimator for the linear regression model with censored survival data is consistent, asymptotically normal and achieves the semiparametric efficiency bound. Finite sample performance is evaluated by simulations.

This is the joint work with Ying Ding. 

2:30 - 3:00 Zuofeng Shang Joint Asymptotics and Inferences for Semi-Nonparametric Models

Abstract: In this talk, we consider the joint asymptotics and inferences for the semi-nonparametric models where the Euclidean parameter and an infinite dimensional parameter are both of interest. Within the general partly linear framework, we derive the joint limit distribution for the two parameters which are rescaled according to different convergence rates. The marginal limit distribution for the Euclidean estimate coincides with that derived in the semiparametric literature. To construct the joint confidence region, we propose the likelihood ratio testing approach that can effectively avoid estimating the asymptotic covariance. The employed regularization tool is the smoothing spline. The undersmoothing of the smoothing spline estimate is required for obtaining the valid joint inferences. The key technical tool is a concentration inequality. A by-product result is the marginal asymptotics for the infinite dimensional parameter that are new even in the nonparametric literature.

This is the joint work with Guang Cheng from Purdue University. 

3:00-3:30PM Break
3:30 - 4:00PM Xiaotong Shen Simultaneous supervised clustering and feature selection

Abstract: In network analysis, genes are known to work in groups by their biological functionality, where distinctive groups reveals different gene functionalities. In such a situation, identifying grouping structures as well as informative genes becomes critical in understanding progression of a disease. Motivated from gene network analysis, we investigate, in a regression context, simultaneous supervised clustering and feature selection over an arbitrary undirected graph, where each predictor corresponds to one node in the graph and existence of a connecting path between two nodes indicates possible grouping between the two predictors.

In this talk, I will review recent developments and discuss computational methods for simultaneous supervised clustering and feature selection over a graph. Numerical examples will be given, in addition to some theoretical aspects of supervised clustering and feature selection. This is joint with Hsin-Cheng Huang and Wei Pan. 

4:00 - 4:30 Cun-hui Zhang A General Theory of Concave Regularization for High Dimensional Sparse Estimation Problems
Abstract: Concave regularization methods provide natural procedures for sparse recovery. However, they are difficult to analyze in the high dimensional setting. Only recently a few sparse recovery results have been established for some specific local solutions obtained via specialized numerical procedures. Still, the fundamental relationship between these solutions such as whether they are identical or their relationship to the global minimizer of the underlying nonconvex formulation is unknown. The current paper fills this conceptual gap by presenting a general theoretical framework showing that under appropriate conditions, the global solution of nonconvex regularization leads to desirable recovery performance; moreover, under suitable conditions, the global solution corresponds to the unique sparse local solution, which can be obtained via different numerical procedures. Under this unified framework, we present an overview of existing results and discuss their connections. The unified view of this work leads to a more satisfactory treatment of concave high dimensional sparse estimation procedures, and serves as guideline for developing further numerical procedures for concave regularization. This is joint with Tong Zhang.
4:30 - 5:00 Hao Helen Zhang Selection of Interaction Effects for Ultra High-Dimensional Data
Abstract: For the ultra-high dimensional data, the identification of important interaction effects is an extremely challenging task, in terms of computation, practical implementation, and theoretical investigation. We propose new methods and computational algorithms to tackle these issues. The new methods are featured with efficient computation, desired theoretical properties, and promising numerical results. Various examples are presented to illustrate the new proposals.

Purdue Department of Statistics, 150 N. University St, West Lafayette, IN 47907

Phone: (765) 494-6030, Fax: (765) 494-0558

© 2023 Purdue University | An equal access/equal opportunity university | Copyright Complaints

Trouble with this page? Disability-related accessibility issue? Please contact the College of Science.