Bowei Xi

Written by: Allison Cummins, M.S. candidate in Statistics

Picture of Bowei Xi

Bowei Xi

The Internet is an exponentially expanding, decentralized, and largely unregulated massive network of private and public networks carrying large amounts of information, resources, and services. Because the Internet is not tightly controlled, obtaining performance data crucial for controlling and managing networks is difficult, and doing so compounds network congestion with the additional bandwidth consumption. Internet Tomography addresses a variety of problems such as mapping network connections as functions of space and time, estimating performance characteristics, and detecting malicious behaviors. Analysis involves estimating large numbers of spatially distributed parameters such as packet loss rates, latency, and available bandwidth. These are statistical large-scale inverse problems.

When data is transferred over the internet, such as transporting information via email, it is broken down into packets that are directed along an appropriate path to meet their destination. This flow of information creates what is known as a data stream. Statisticians use the stream data to investigate the inter-arrival times of packets by examining either the time between packet arrivals or focusing on a particular time frame and counting the number of arrivals.

Professor Bowei Xi began her work on active Internet Tomography as a Ph.D. student at the University of Michigan and has continued her work in this area since arriving at the Department of Statistics at Purdue University in 2004. Xi used a network traffic simulator, ns-2, to test statistical models and found that all link-level parameters were identifiable using a class of flexible experiments to actively probe the network. She found that least-square type estimators under a log-linear model were able to efficiently compute estimates for the link parameters and could easily be updated with new incoming information.

Although simulations offer more simplistic scenarios to test statistical methods, unforeseen problems and complications can arise when using actual data. Xi and her colleagues at Purdue University are currently working on two projects that model real Internet traffic. One project applies stochastic processes theory to the general Internet traffic where packets from various applications are superimposed to create a packet stream. Another project focuses on Voice over IP (VoIP), a very time-sensitive application requiring extremely high quality links.

Because of the unique VoIP traffic generation mechanism, it requires its own model based on the following statistical properties:

  1. call arrival process

  2. call duration distribution

  3. aggregate packet process

  4. silence suppression.

Due to silence suppression, every VoIP call trace can be characterized by a succession of active periods (sending packets, ON) followed by inactive periods (silence, OFF) referred to as an ON-OFF process. VoIP traffic is created by superimposing these ON-OFF processes. Xi's work on Internet traffic is supported by the National Science Foundation.

Along with monitoring Internet traffic, Xi is also interested in adversarial classification including spam filtering and intrusion detection using data mining and knowledge of machine learning. With the current classification systems, problems arise when the adversary adapts to avoid detection. The mutation causes future datasets and current datasets to come from different populations causing current classification techniques to no longer be optimal. To model such a problem, Xi and her colleagues propose a two-player Stackelberg game, in which players move sequentially with the first player maximizing profit and the second player minimizing loss. A classifier's performance at an equilibrium state of this game will more accurately predict its effectiveness in detecting mutating adversaries. In 2007, Xi obtained a US patent titled "Methods and apparatus for automatic system parameter configuration for performance improvement" for her work in machine learning and data mining.

Segueing from the binary realm, Xi has continued applying statistical techniques to the area of metabolomics, the study of chemical fingerprints of various cellular processes. The two most commonly used analytical tools used in metabolomics are nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS). Taking advantage of both NMR and MS techniques can provide a reliable method of detecting deviations of metabolites in biofluid samples to achieve a more accurate classification of cancer cells. In 2007, one of Xi's publications, "Monitoring Diet Effects via Biofluids and Their Implications for Metabolomics Studies," was the 13th most cited article on Analytical Chemistry. Xi's work in metabolomics is support by the National Institute of Health (NIH).

Xi will be teaching Applied Multivariate Analysis (STAT 524) in the fall. She enjoys traveling to large cities and this summer will be spending some time in San Francisco and New York. For more information about Professor Xi, please visit her website.

July 2009