Session 10 - Department of Statistics - Purdue University Skip to main content

Statistics for Communications & Information Engineering

Speaker(s)

  • Curtis B Storlie (LANL)
  • Vaidyanathan Ramasvami (AT&T)
  • Luo Si (Purdue University)
  • Carlos Scheidegger (AT&T)
  • Bowei Xi (Purdue University)

Description

Techniques to handle very large data sets, data mining, machine learning, and probability modeling not only find numerous applications in the communications and information industries, but many problems related to them help drive progress in statistical and algorithmic research. These applications underscore an inter-disciplinary approach as well as a close collaboration between academia and industry. The goal of this session is to bring together some leading researchers in these fields to provide an understanding of some of the current progress and challenges. 

Schedule

Sat, June 23 - Location: STEW 310

TimeSpeakerTitle
8:30 - 9:15AM Curtis B Storlie Computer Network Hacker Detection via Locally Anomalous Subgraphs
Abstract: Identifying anomalies in computer networks is a challenging and complex problem. Often, the anomalies of interest occur in extremely local areas of the network. Locality is complex in this setting, since there is an underlying graph structure. To identify local anomalies, we introduce a novel path based scan statistic for data extracted from the edges of a graph over time. Paths are motivated by hacker behaviors observed in real network attacks. To identify local anomalies, paths of length 3 are enumerated over the entire graph, over a set of sliding time windows. Local statistics in each window are compared with their expected behavior under a statistical model to identify anomalies. Data speeds on larger networks require online detection to be nimble. We have therefore designed an anomaly detection system that achieves far better than real-time analysis speed, as applied to Los Alamos National Laboratory's entire internal network (∼20,000 hosts). We are able to identify communications between computers during previous intrusions in LANL's unclassified network. Finally, we discuss an extension of this work to more accurately represent the caterpillar type shapes of anomalies typically seen during intrusions using paths as building blocks.
9:15 - 10:00 Vaidyanathan Ramaswami Applied Probability & Statistics for Communications: Some Challenges and Opportunities
Abstract: The last couple of decades have seen major strides in statistical methodologies. Thanks to increased computing power and its exploitation, many problems that were not previously considered for reasons of having too many parameters, not enough samples relative to the parameters, identifiability issues, etc., and were limiting the use of statistics are now being handled routinely.

While many fields like mathematical finance, bio-statistics, genetics, speech technologies, etc. have benefited from the above, traditional areas like reliability, queueing, etc., are yet to incorporate them into their research. The questions here are harder for statistics too as they involve time dynamics in subtle ways.

In this talk, I will review a set of examples from a variety of areas — modeling WiFi traffic, heavy tailed repair times, and internet performance to demonstrate the opportunities that exist for both research and practice In the interface of statistics and applied probability. 
10:00-10:30AM Break
10:30 - 11:00 Luo Si A learning approach for expertise search
Abstract: Working in the information age, the most important thing may not be what you know, but who you know. In many large academic, commercial, and government organizations, it is often a critical task to identify experts in specific topics. In real-world applications of expert search, it is important to identify experts from heterogeneous types of sources, which contain documents with multiple types of associations with expert candidates. Previous research used generative probabilistic models with heuristic solutions to combine two factors as document evidence (i.e., document evidence for user query) and document-candidate association. Furthermore, previous research combined expertise information obtained from heterogeneous sources equally or with fixed weights, which is suboptimal. Si and his colleagues advanced the state-of-the-art of expert search by designing formal expert search models and building a real-world expert search system. This talk will describe some r ecent research work as: 1). A unified model that integrates document evidence and document-candidate associations for expert search; 2). A mixture discriminative model approach for adaptively ranking experts in heterogeneous information sources for different types of expert candidates and different types of user queries; and 3). A joint classification model for faculty homepage finding in academic portals.
11:00 - 11:30 Carlos Scheidegger EDA, visualization and collaboration on the web
Abstract: This talk will present ongoing work on an R-based EDA environment on a modern web browser. As organizations and businesses become more data-driven, EDA becomes increasingly important. In large organizations, sharing derived data, experiments, graphics and visualizations is still a hard problem, and is addressed mainly by stand-alone version-control systems. In contrast, we envision a system where users collaborate with each other fluidly, moving from a full-fledged R command prompt to integrated code search among all users to a simplified "pick your analysis, visualization and data" view. In this environment, sharing a new dataset, analysis or visualization should be as easy as creating it. We will show the current state of the tool via live demos.

This is joint work with Simon Urbanek 
11:30 - 12:00 Bowei Xi Multifractal and Gaussian Fractional-Sum-Difference Models for Internet Traffic
Abstract: A multifractal fractional sum-difference model (MFSD) is a monotone transformation of a Gaussian fractional sum-difference model (GFSD), the Gaussian image of the MFSD. The GFSD is a mixture of two components: a moving two-sum of discrete fractional Brownian motion (fBm), and white noise. Internet packet traffic inter-arrival times are very well modeled by an MFSD; this is validated by extensive model checking for 715,665,213 measured arrival times on 3 Internet links. Mathematical investigations of many traffic statistics, enabled by the mathematical tractability of the model, provide new insight for traffic phenomena; this includes fundamental explanations of a number of phenomena based on how the relative weights of the fBm and white noise components change with changing factors such as the traffic rate and time aggregation. The MFSD can be used to generate synthetic traffic for network simulation.

This is a joint work with David Anderson and William S. Cleveland. 

Purdue Department of Statistics, 150 N. University St, West Lafayette, IN 47907

Phone: (765) 494-6030, Fax: (765) 494-0558

© 2023 Purdue University | An equal access/equal opportunity university | Copyright Complaints

Trouble with this page? Disability-related accessibility issue? Please contact the College of Science.