 |
Paul Kidwell
|
Graduate Student
|
|
|
Address:
Department of Statistics
Purdue University
150 N. University Street
West Lafayette, IN 47907-2068
Office : MATH 539
Office Phone :765-494-0026
Fax : 765-494-0558
Advisors:
Prof. William Cleveland and Prof. Guy Lebanon
Current Research, Activities, and Projects
Rank Data Analysis
Ranking occurs in many forms on a daily basis in each of our lives. Any instance requiring you to express an opinion over a field options creates a ranking. Ranking can be as straightforward as simple questions.
- "Who do you like in the presidential primary?"
- "What are your 3 favorite television shows?"
- "What are foods will you not eat?"
Ranking data appear in many practical applications such as marketing, colloborative filtering (recommendation systems, e.g. web page retrieval), and decision analysis. Real world data is often incomplete (each object has not been rated by every rater) or partial (there is no preference between 2 or more objects). For example, consider Netflix movie rankings, a typical user will view only a small fraction (incomplete) of the 2000 titles available and will likely not have a preference among certain films (partial).
A challenge in the analysis of rank data is developing methods for analyzing real world ranking types (incomplete and partial). Methods are needed for probabilistic modeling, inclusion of covariates (on both objects and raters), and visualization. Incomplete data introduces a factorial growth rate in the number of rankings consistent with the observed data. Probabilistic modeling of incomplete data must account for all possible rankings consistent with the observed preferences. Successful methods for analyzing incomplete data must assign probabilities in a consistent manner over the sets of complete rankings compatible with a particular incomplete or partial ranking. We are working to justify the utilization of approximations to overcome the obstacle imposed by large sets of compatible rankings. We are also approaching visualization by using appropriate metrics on the incomplete and partial rankings and then projecting these rankings into a 2-dimensional space.
Network Intrusion Detection
Network intrusion detection has tremendous value as a practical problem and is interesting statistically due to the large amount of data. Our first step in intrusion detection has focused on identifying suspect connections. Many connections may be identified as botnets or other automated intruders, but there is not a method for classifying "the big fish" - human intruders.
The network intrusion algorithm has necessitated development of empirical rules based procedures for identifying keystrokes. These procedures rely heavily on the use of visualization to locate characteristic patterns.
Posters:
Cerias Security Syposium - Intrusion Detection via Keystroke Presence, March 2007 .pdf
Links of Interest
Research:
Purdue:
References:
Some Useful Papers:Papers
Some Notes on Various Topics:Notes
Statistical Consulting Service
Purdue Statistical Consulting
Education
BS Industrial Engineering, Purdue '00
MS Applied Math, DePaul '04
MS Mathematical Statistics, Purdue '07
PhD Statistics, Purdue '09, expected