Session 11 - Department of Statistics - Purdue University Skip to main content

Variable Selection and Sufficient Dimension Reduction

Speaker(s)

  • Peng Zeng (Auburn University)
  • Wenxuan Zhong (University of Illinois, Urbana-Champaign)
  • Bing Li (Pennsylvania State University)
  • Zongming Ma (University of Pennsylvania)
  • Mary Meyer (Colorado State University)
  • Dingfeng Jiang (Abbott Laboratories)

Description

Statistical methods for variable selection and dimension reduction have attracted much attention recently, due to the demand from analzying high dimensional massive data. This session will bring together researchers working on developments of new variable selection and dimension reduction methods. The topics of this session have a wide range of application areas including bioinformatics, machine learning, and many other disciplines with high dimensional data. 

Schedule

Sat, June 23 - Location: STEW 214

TimeSpeakerTitle
8:30-8:55AM Peng Zeng Variable Selection and Estimation for High-dimensional Data using Multiple-Index Models
Abstract: Multiple-index models are natural extension of linear regression models when a linear relationship between the response and predictors does not hold. In this talk, we propose a penalized local linear smoothing method for variable selection and estimation in multiple-index models. This method incorporates an L1 penalty of the derivative of the link function into the loss function of local linear smoothing. It can be considered as an extension of the usual lasso to multiple-index models. An efficient algorithm is derived to calculate the estimates and solution paths. The properties of the estimates and solution paths are investigated. The excellent performance is demonstrated using simulation examples and real examples.
9:00-9:25AM Wenxuan Zhong Trace-based variable selection method
Abstract: In this talk, I will present a trace-based variable selection and screening method developed under the sufficient dimension reduction framework. Using the trace-based method, we do not need to impose a special form of relationship between the response variable and the predictor variables. Various asymptotic properties of the trace-based procedure will be discussed, and in particular, its variable selection performance under diverging number of predictors and sample size will be discussed.
9:30-9:55AM Bing Li A general theory of nonlinear sufficient dimension reduction: formulation and estimation
Abstract: In this talk we give a general formulation of nonlinear sufficient dimension reduction, and explore its ramifications and scope. This formulation subsumes recent work employing reproducing kernel Hilbert spaces, and reveals many parallels between linear and nonlinear sufficient dimension reduction. Using these parallels we analyze the population-level properties of existing methods and develop new ones. We begin at the general level of sigma-fields, and proceed to that of measurable and generating classes of functions. This leads to the notions of sufficient, complete and sufficient, and minimal sufficient dimension reduction classes. We show that, when it exists, the complete and sufficient class coincides with the minimal sufficient class, and can be unbiasedly and exhaustively estimated by a generalized slice inverse regression estimator (GSIR). When completeness does not hold, this estimator captures only part of the central class (i.e. remains unbiased but is no longer exhaustive). However, we show that a generalized sliced average variance estimator (GSAVE) can capture a larger portion of the class. Both estimators require no numerical optimization, because they can be computed by spectral decomposition of linear operators. Finally, we compare our estimators with existing methods by simulation and on actual data sets.
10:00-10:30AM Break
10:30-10:55AM Zongming Ma Sparse Singular Value Decomposition in High Dimensions
Abstract: Singular value decomposition is a widely used tool for dimension reduction in multivariate analysis. However, when used for statistical estimation in high-dimensional low rank matrix models, singular vectors of the noise-corrupted matrix are inconsistent for their counterparts of the true mean matrix. In this talk, we suppose the true singular vectors have sparse representations in a certain basis. We propose an iterative thresholding algorithm that can estimate the subspaces spanned by leading left and right singular vectors and also the true mean matrix optimally under Gaussian assumption. We further turn the algorithm into a practical methodology that is fast, data-driven and robust to heavy-tailed noises. Simulations and a real data example further show its competitive performance.

This is a joint work with Andreas Buja and Dan Yang.
11:00-11:25AM Mary Meyer Testing and Variable Selection in the Additive Isotonic Model
Abstract: A new algorithm for the partial linear additive isotonic regression involves formulating the solution as a single projection onto a polyhedral convex cone. This is more computationally efficient than a cyclical pooled adjacent violators algorithm, and in addition provides a platform for inference, include hypothesis testing and variable selection procedures. Simulations show that the procedures can out-perform the standard parametric methods, even when the parametric assumptions are true.
11:30-11:55AM Dingfeng Jiang Majorization Minimization by Coordinate Descent for Concave Penalized Generalized Linear Models
Abstract: Recent studies have demonstrated theoretical attractiveness of a class of concave penalties in variable selection, including the smoothly clipped absolute deviation and minimax concave penalties. The computation of concave penalized solutions in high-dimensional models, however, is a difficult task. We propose a majorization minimization by coordinate descent (MMCD) algorithm for computing the concave penalized solutions in generalized linear models. In contrast to the existing algorithms that use local quadratic or local linear approximation for the penalty function, the MMCD seeks to majorize the negative log-likelihood by a quadratic loss, but does not use any approximation to the penalty. This strategy makes it possible to avoid the computation of a scaling factor in each update of the solutions, which improves the efficiency of coordinate descent. Under certain regularity conditions, we establish the theoretical convergence property of the MMCD. We implement this algorithm for a penalized logistic regression model using the SCAD and MCP penalties. Simulation studies and a data example demonstrate that the MMCD works suciently fast for the penalized logistic regression in high-dimensional settings where the number of covariates is much larger than the sample size.

Purdue Department of Statistics, 150 N. University St, West Lafayette, IN 47907

Phone: (765) 494-6030, Fax: (765) 494-0558

© 2023 Purdue University | An equal access/equal opportunity university | Copyright Complaints

Trouble with this page? Disability-related accessibility issue? Please contact the College of Science.