Session 2 - Department of Statistics - Purdue University Skip to main content

Statistical Issues with Modeling of Networks

Speaker(s)

  • Tanya Y. Berger-Wolf (University of Illinois, Chicago)
  • Krista J. Gile (University of Massachusetts)
  • Jure Lescovec (Stanford University)
  • Deepayan Chakrabarti (Facebook)

Description

Increasingly more domains are producing data which is relational in nature, and as a result, there is an interest in understanding of properties of underlying relations represented by graphs. However, the setting for learning models for large graphs leads to unique challenges as the set of possible networks while finite is huge, and the number of training instances is typically very small, with some instances having a sample size of one. This unusual setting leads to an array of difficulties in model estimation including but not limited to:

  • Issues in parameter estimation (e.g., intractability, numerical instability).
  • Degeneracy or near degeneracy of estimated models (e.g., significant mass placed on unrealistic graphs or virtually no mass placed on observed graphs).
  • Lack of variability in graphs likely under the estimated model.
  • Graphs likely under the learned models not reproducing the properties of the graphs in the training set.

The aim of this session is to bring together researchers from across the computational and theoretical spectrum to share their insights about possible causes and possible solutions for the above problems. 

Schedule

Fri, June 22 - Location: STEW 314

TimeSpeakerTitle
8:30 - 9:15AM Tanya Y. Berger-Wolf Analysis of Dynamic Networks: from Data Collection to Meaningful Insight
Abstract: From emails and Twitterâ„¢ follows to highschool friendships and zebras grazing together, large, noisy, and highly dynamic networks of interactions are everywhere. Collecting data at the level of detail necessary to answer many application questions often results in intractable computational formulations. I will present a network sampling framework which results in representative samples for many network problems. I will also present a framework for the inference of the temporal scale of interaction streams which have implications on the sampling rate and representation of those as dynamic networks. I will demonstrate analysis techniques of dynamic networks and their implications on understanding of the application domain.
9:15 - 10:00 Krista J. Gile New Methods for Inference from Respondent-Driven Sampling Data
Abstract: Respondent-Driven Sampling is type of link-tracing network sampling used to study hard-to-reach populations. Beginning with a convenience sample, each person sampled is given 2-3 uniquely identified coupons to distribute to other members of the target population, making them eligible for enrollment in the study. This is effective at collecting large diverse samples from many populations.

Current estimation relies on sampling weights estimated by treating the sampling process as a random walk on the underlying network of social relations. These estimates are based on strong assumptions allowing the data to be treated as a probability sample. In particular, existing estimators assume a with-replacement sample with an ideal initial sample. We introduce two new estimators, the first based on a without-replacement approximation to the sampling process, and the second based on fitting a social network model (ERGM), and demonstrate their ability to correct for biases due to the finite population and initial convenience sample. Our estimators are based on a model-assisted design-based approach, using standard errors based on a parametric bootstrap. We conclude with an application to data collected among injecting drug users, including extension to observable features of the sampling process.

This talk includes joint work with Mark S. Handcock 
10:00-10:30AM Break
10:30 - 11:15 Jure Leskovec Affiliation network models of clusters in networks
Abstract: Networks are a general language for describing social, technological and biological systems. Nodes in such networks organize into densely linked and overlapping clusters that correspond to communities in social networks, functionally related proteins in biological networks, or topically related webpages in information networks. Identifying such clusters is crucial to the understanding of the structural and functional roles of networks.

Our work stems from an intuitive observation that the probability of an edge between a pair of nodes increases with the number of shared cluster affiliations, which means that cluster overlaps are more densely connected that their non-overlapping parts. We discuss a model-based network community detection method that builds on bipartite node-community affiliation networks and can detect dense cluster overlaps. The approach allows for modeling overlapping, non-overlapping as well as hierarchically nested clusters. We develop a set of model inference techniques and accurately identify clusters in networks ranging from biological protein-protein interaction networks to social, collaboration and information networks. The results show imply that while networks organize into overlapping communities, globally networks also exhibit a nested core-periphery structure, which arises as a consequence of overlapping parts of communities being more densely connected. 
11:15AM-12:00PM Deepayan Chakrabarti Nonparametric Link Prediction in Dynamic Networks
Abstract: We discuss a non-parametric link prediction algorithm for a sequence of graph snapshots over time. The model predicts links based on the features of its endpoints, as well as those of the local neighborhood around the endpoints. This allows for different types of neighborhoods in a graph, each with its own dynamics (e.g, growing or shrinking communities). We prove the consistency of our estimator, and give a fast implementation based on locality-sensitive hashing. Experiments with simulated as well as five real-world dynamic graphs show that we outperform the state of the art, especially when sharp fluctuations or non-linearities are present.

Purdue Department of Statistics, 150 N. University St, West Lafayette, IN 47907

Phone: (765) 494-6030, Fax: (765) 494-0558

© 2023 Purdue University | An equal access/equal opportunity university | Copyright Complaints

Trouble with this page? Disability-related accessibility issue? Please contact the College of Science.