When Statistics Embraces A.I.

Purdue Department of Statistics
Home ยป Projects



Dictionary Learning and Pruning

Dictionary learning is a cutting-edge area in imaging processing, that has recently led to state-of-the-art results in signal processing tasks, such as image denoising, texture synthesis, and audio processing. The idea is to conduct a linear decomposition of a signal using a few atoms of a learned and usually over-completed dictionary in stead a set of pre-defined basis. While in the big data scheme, determining a proper size for the to-be-learned dictionary is crucial for both precision and efficiency of the process, most of the existing dictionary learning algorithms choose the size quite arbitrarily. In this project, we propose a novel regularization method that could simultaneously estimate sparse atoms and select the number of atoms for the learned dictionary. Our method is applied to image denoising and the experimental results achieve state-of-the-art performances compared with the leading alternative denoising methods.


Estimation of Heterogeneity For Multinomial Probit Models

Empirical studies suggest that utility functions are often irregularly shaped, and individuals often deviate widely from each other. In this paper, we introduce a multinomial probit model involving both parametric covariates and nonparametric covariates. To combine heterogeneity across individuals with flexibility, we have two different strategies for the parametric component and the nonparametric component. For the parametric component, heterogeneity is incorporated by the inclusion of random effects. As for the nonparametric component, each individual has a unique individual-level function. However, all those nonparametric components share the same basis. Because the basis is unknown, we use dictionary learning to learn the basis. Additionally, an EM algorithm serves to estimate the multinomial probit models.


Local Region Image-on-Scalar Regression

In this paper, we propose a new regularization technique to estimate the coefficient image in a image-on-scalar regression model. We have developed a locally sparse estimator when the value of the coefficient image is zero within certain sub-regions. At the same time, the estimator has the ability to explicitly account for the piecewise smooth nature of most images. The ADMM is used to estimate the unknown coefficient images. Simulation and real data analysis have shown a superior performance of our method against many existing approaches. The distributed algorithm is also developed to handle big data. Graphs on the left show an example of the estimators we recovered from the simulation (true on the left, estimator on the right).


Click Prediction Using Deep Learning

We all have experienced real time bidding advertising. When we load a webpage with advertisements, they might be related to your google search history, or maybe your last amazon purchase. These advertisements are distributed by demand side platform companies, that try to determine if an advertisement is worth placing and how much they are willing to pay for it. One of these DSPs is iPinYou, who released a dataset of their advertisement transactions. The challenge is to determine based on the user information iPinYou receives, whether or not the user is going to click on a certain advertisement. This prediction process model is what I want to build with deep neural networks, and compare the results with traditional regression methods for this prediction problem.


Machine Learning Introduction With MNIST Dataset

Machine learning is a powerful tool used in a wide breadth of applications including quality control, image classification, and self-driving cars. We will give a broad summary of many fundamental machine learning models, so that the reader may gain an intuition behind their formulation. In this paper, we use the MNIST data-set to demonstrate the mechanisms of machine learning models. Three learning paradigms will be discussed in detail: supervised, unsupervised, and representational learning. Each of the learning paradigms will be separated into individual methods, and each of the methods will be separated into the three sections: the problem formulation, a brief overview of the algorithm, and the result on the MNIST data-set. After each of the models and results are presented, a summary of the models will be provided.