By A.Goldenberg, A.X.Zheng, S.E.Fienberg and E.M.Airoldi

Presented by Wutao Wei

- Networks have been used to analyze interpersonal social relationships, communication networks, academic paper coauthorships and citations, protein interaction patterns, and much more.
- In this work, we survey selective aspects of the literature on statistical modeling and analysis of networks in social sciences, computer science, physics, and biology
- Focus on the statistical modelling of networks

Graphs, Nodes, and Edges

- Probability theory associated with random graph models
- Efficient computation on networks
- Use of the network as a tool for sampling
- Neural networks
- Networks and economic theory
- Relational networks
- Bi-partite graphs
- Agent-based modeling

- Social scientists: are often interested in questions of interpretation such as the meanings of edges in a social network
- Physicists: tend to be interested in understanding parsimonious mechanisms for network formation
- Computational biologist: the protein-protein interaction networks, genetic interaction networks
- Machine learning: predict missing information, functional clustering

- Sampson's "Monastery" Study
- The Enron Email Corpus
- The Protein Interaction Network in Budding Yeast
- The Add Health Adolescent Relationship and HIV Transmission Study
- The Framingham "Obesity" Study
- The NIPS Paper Co-Authorship Dataset

- Sampson spent several months in a monastery in New England, where a number of novices were preparing to join a monastic order.
- There are ight factions among the novices: the loyal opposition (whose members joined the monastery first), the young turks (whose members joined later on), the outcasts (who were not accepted in either of the two main factions), and the waverers (who did not take sides).
- About a year after leaving the monastery, Sampson surveyed all of the novices, and asked them to rank the other novices in terms of four sociometric relations: like/dislike, esteem, personal influence, and alignment with the monastic credo, retrospectively, at four different epochs spanning his stay at the monastery.

Network derived from "whom do you like" sociometric relations collected by Sampson.

- Enron Corp was an energy and trading company specializing in the marketing of electricity and gas. It was the 7th largest company in the US in 2000, but filed backruptcy in the end of 2001.
- The email logs from most of Enron's employees and the Federal Energy Regulatory Commission
- The original FERC dataset contains 619,446 email messages (about 92% of Enron's staffs' emails), and the cleaned-up CALO dataset contains 200,399 messages from 158 users. Another version of the data consists of the contents of the mail folders of the top 151 executives, containing about 225,000 messages covering a period from 1997 to 2004

E-mail exchange data among 151 Enron executives, using a threshold of a minimum of 5 messages for each link.

E-mail exchange data among 151 Enron executives, using a threshold of a minimum of 30 messages for each link.

- The budding yeast is a unicellular organism that has become a de-facto model organism for the study of molecular and cellular biology
- Currently, there are four main sources of interactions between pairs of proteins that target proteins localized in different cellular compartments with variable degrees of success: (i) literature curated interactions, (ii) yeast two-hybrid (Y2H) interaction assays, (iii) protein fragment complementation (PCA) interaction assays, and (iv) tandem affinity purication (TAP) interaction assays

A popular image of the protein interaction network in Saccharomyces cerevisiae, also known as the budding yeast.

- The National Longitudinal Study of Adolescent Health (Add Health) is a study of adolescents in the United States drawn from a representative sample of middle, junior high, and highschools.
- The study focused on patterns of friendship, sexual relationships, as well as disease transmissions. To date, four waves of surveys have been collected over the course of fifteen years.
- In total 4 waves of survey were given in 15 years.

The Add Health sexual relationships network of US highschool adolescents.

- Participants completed a questionnaire and underwent physical examinations (including measurements of height and weight) in three-year periods beginning 1973, 1981, 1985, 1989, 1992, 1997, 1999.
- Christakis and Fowler derive body mass index information on a total of 12,067 individuals who appeared in any of the Framingham Heart cohorts (one "close friend" for each cohort member).
- In particular they claim to have examined whether the data conformed to "small-world," "scale-free," and "hierarchical" types of random graph network models.

Obesity network from Framingham offspring cohort data. Each node represents one person in the dataset (a total of 2200 in this picture). Circles with red borders denote women, with blue borders { men. The size of each circle is proportional to the body-mass index. The color inside the circle denotes obesity status - yellow is obese (body-mass \(\geq\) 30, green is non-obese. The colors of ties between nodes indicate relationships - purple denotes a friendship or marital tie and orange is a familial tie.

- The NIPS dataset contains information on publications that appeared in the Neural Information Processing Systems (NIPS) conference proceedings, volumes 1 through 12, corresponding to years 1987-1999, the pre-electronic submission era.
- In total, there are 2,037 authors and 1,740 papers with an average of 2.29 authors per paper and 1.96 papers per author.

NIPS paper co-authorship data. Each point represents an author. Two authors are linked by an edge if they have co-authored at least one paper at NIPS. Left: 1991-1994. Right: 1995-1998. Each graph contains all the links for the selected period. Several well known people in the Machine Learning field are highlighted. The size of the circles around selected individuals depend on their number of collaborations. Colors are meant to facilitate visualization.

- A number of basic network models are essentially static in nature
- Origin from Mathematical community: the Erdos-Renyi-Gilbert model and led to two types of generalizations
- Origin from statistics and social sciences communities: social networks

- A graph or network \(G\) is often defined in terms of nodes and edges, \(G \equiv G(\mathcal{N},\mathcal{E})\), where \(\mathcal{N}\) is a set of nodes and \(\mathcal{E}\) a set of edges, and \(N=\lvert\mathcal{N}\lvert\), \(E=\lvert\mathcal{E}\lvert\)
- \(G\) is often defined in terms of the nodes and the corresponding measurements on pairs of nodes, \(G \equiv G(\mathcal{N},\mathcal{Y})\), \(\mathcal{Y}\) is usually represented as a square matrix of size \(N\times N\)
- We will work with graphs mostly defined in terms of its set of \(N\) nodes and its binary adjacency matrix \(Y\) containing \(\sum_{ij} Y_{ij} = E\)

- Describes an undirected graph involving \(N\) nodes and a fixed number of edges \(E\)
- The \(G(N,p)\) model has a binomial likelihood where the probability of E edges is \(l\left(G(N,p) \text{ has }E\text{ edges}\mid p \right) = p^E(1-p)^{{N \choose 2}-E}\)
- in terms of the \(N\times N\) binary adjacency matrix \(Y\), \(l(Y\mid p)=\prod_{i\neq j} p^{Y_{ij}}(1-p)^{1-Y_{ij}}\)

- Define \(\lambda=pN\), where \(p=E/{N \choose 2}\)
- A phase change at \(\lambda=1\)
- If \(\lambda < 1\), then a graph in \(G(N,p)\) will have no connected components of size larger than \(O(\log N)\), a.s. as \(N\rightarrow \infty\)
- If \(\lambda=1\), then a graph in \(G(N,p)\) will have a largest connected component whose size is of \(O(N^{2/3})\), a.s. as \(N\rightarrow \infty\)
- If \(\lambda\) tends to a constant \(c>1\), then a graph in \(G(N,p)\) will have a unique "giant" component containing a positive fraction of the nodes, a.s. as \(N\rightarrow \infty\). No other component will contain more than \(O(\log N)\), a.s. as \(N\rightarrow \infty\).

- The exchangeable graph model provides the simplest possible extension of the original random graph model by introducing a weak form of dependence among the probability of sampling edges (i.e., exchangeability) that is due to non-observable node attributes, in the form of node-specific binary strings.
- Consider the following data generating process for an exchangeable graph model, which generates binary observations on pairs of nodes.
- Sample node-specific K-bit binary strings for each node \(n\in \mathcal{N}\), \(\overrightarrow{b_n} \sim\) unif (vertex set of K-hypercube)
- Sample directed edges for all node pairs \(n,m \in \mathcal{N} \times \mathcal{N}\), \(Y_{nm} \sim Bern \left(q(\overrightarrow{b_n},\overrightarrow{b_m})\right)\)

- From a statistical perspective, the exchangeable graph model we survey hereprovides perhaps the simplest step-up in complexity from the random graph model
- A class of random graphs with such a property has been recently rediscovered and further explored in the mathematics literature, where the class of such graphs is referred to as inhomogeneous random graphs
- An alternative and arguably more interesting set of specifications can be obtained by imposing dependence among the bits at each node. This can be accomplished by sampling sets of dependent probabilities from a family of distributions on the unit hypercube.
- Sample node-specific K-bit binary strings for each node \(n\in \mathcal{N}\), \(\overrightarrow{p_n} \sim hypercube(\overrightarrow{\mu},\sigma,\alpha)\), where \(\sigma > (K-1)\cdot \alpha>0\), \(b_{nk}\sim Bern(p_{nk})\), for \(k=1,\dots,K\)
- Sample directed edges for all node pairs \(n,m \in \mathcal{N} \times \mathcal{N}\), \(Y_{nm} \sim Bern \left(q(\overrightarrow{b_n},\overrightarrow{b_m})\right)\), where \(\overrightarrow{\mu},\sigma,\alpha\) control the frequency, variability and correlation of the bits within a string.

- The sparsity of the bit strings is controlled by the parameter \(\alpha>0\). A larger value of \(\alpha\) leads to larger negative correlation among the bits and thereby a sparser network.
- Source of variability: the probability of an edge decreases with the number of bits K, as more complexity reduces the chances of an edge,and the probability of an edge increases with \(1/\alpha\). As in Durrett's analysis, the giant component emerges because a number of smaller components must intersect with high probability.

Left panel. An example adjacency matrix that correspond to a fully connected component among 100 nodes. Right panel. The clustering coefficient as a function of \(\alpha\) on a sequence of graphs with 100 nodes.

- To compare models A and B via likelihood, we can perform the procedure
- Given a graph G, fit models \(A(\Theta_a)\) and \(B(\Theta_b)\) to obtain an estimate of their parameters \(\Theta_a^{Est}\) and \(\Theta_b^{Est}\) respectively
- Sample M graphs at random from the support of \(A(\Theta_a^{Est})\) and \(B(\Theta_b^{Est})\)
- Compute the distributions of summary statistics based on notion from information theory, such as information profile and entropy histogram, corresponding to the 2M graphs sampled from A and B.
- Compare models in terms of the distribution on the statistics above, such as the complexity of the two models' supports and their similarity to the complexity of G.

- The Parameters:
- \(\theta\): a base rate for edge propagation,
- \(\alpha_i\) (expansiveness): the effect of an outgoing edge from \(i\),
- \(\beta_j\) (popularity): the effect of an incoming edge into \(j\),
\(\rho_{ij}\) (reciprocation/mutuality): the added effect of reciprocated edges.

The model is directional

- Let \(P_{ij}(a,b)\), \(a,b\in 0,1\) shows the connection probabilities between nodes
- \(\log P_{ij}(0,0) = \lambda_{ij}\)
- \(\log P_{ij}(1,0) = \lambda_{ij} + \alpha_i + \beta_j + \theta\)
- \(\log P_{ij}(0,1) = \lambda_{ij} + \alpha_j + \beta_i + \theta\)
- \(\log P_{ij}(1,0) = \lambda_{ij} + \alpha_i + \beta_j + +\alpha_j + \beta_i +2\theta + \rho_{ij}\)

- When \(\alpha_i = 0\), \(\beta_j = 0\), and \(\rho_{ij}=0\), this is basically an
*Erodos-Renyi-Gilbert model*for directed graphs: each directed edge has the same probability of appearance - When \(\rho_{ij}=0\),
*no reciprocal effect*. This model effectively focuses solely on the degree distributions into and out of nodes. - \(\rho_{ij}=\rho\),
*constant reciprocation*. This was the version of \(p_1\) studied in depth by Holland and Leinhardt using maximum likelihood estimation. - \(\rho_{ij}=\rho +\rho_i +\rho_j\),
*edge-dependent reciprocation*. - The likelihood function for the \(p_1\) model is clearly in exponential family form. For the constant reciprocation version, we have \[\log P_{r_{p_1}}(y)\propto y_{++}\theta + \sum_{i} y_{i+}+\alpha_i + \sum_{j} y_{+j}+\beta_j + \sum_{ij} y_{ij}y_{ji}\rho\]

- A major problem with the \(p_1\) and related models, recognized by Holland and Leinhardt, is the lack of standard asymptotices to assist in the development of goodness-of-fit procedures for the model.
- Since the number of \(\{\alpha_i\}\) and \(\{\beta_j\}\) increase directly with the number of nodes, we have no consistency results for the maximum likelihood estimates, and no simple way to test for \(\rho=0\)
- A few ad hoc fixes have been suggested in literature, the most direct of which deals with the problem by setting subsets of the \(\{\alpha_i\}\) and \(\{\beta_j\}\) equal to one another or by considering them as arising from common prior distributions.
- Fienberg suggested the use of tools from algebraic statistics to find Markov basis generators for the model and the conditional distribution of the data given the MSSs.

- Fienberg and Wasserman proposed a slightly different dyad-based data representation for the \(p_1\) model, [ ]
where k and l take the values of 1 or 0, which converts the dyad \(\{D_{ij} = (y_{ij}, y_{ji})\}\) into a \(2\times 2\) table with exactly one entry of 1 and the rest 0.

Then if we collect the data for the \(n(n - 1)=2\) dyads together, they form an \(n\times n\times 2\times 2\) incomplete contingency table with "structural" zeros down the diagonal of the \(n\times n\) marginal (i.e., no self loops), and "duplicate" data for each dyad above and below the diagonal.

Application for Sampson's monk dataset

- Treat \(p_1\) model's expansiveness, \(\{\alpha_i\}\), and/or popularity, \(\{\beta_j\}\) as random effects rather than fixed effects
- Bayesian extension of frequentist approaches which applies MCMC methods to do the analysis

- Under the assumption that two possible edges are dependent only if they share a common node, Frank and Strauss proved the following characterization for the probability distribution of undirected Markov graphs: \[P_{\theta}\{Y=y\} = \exp\left(\sum_{k=1}^{n-1}\theta_k S_k(y) + \tau T(y) + \phi(\theta,\tau)\right), y \in \mathcal{Y}\]
- where \(\theta:=\{\theta_k\}\) and \(\tau\) are parameters, \(\phi(\theta,\phi)\) is the normalizing constant, and the statistics \(S_k\) and \(T\) are conuts of specific structures such as edges, trangles, and k-stars

K-stars:

- Frank and Strauss worked mainly with the three parameter model where \(\theta_3,\dots,\theta_{n-1}\). They proposed a pseudo-likelihood parameter estimation method that maximizes \[l(\theta)=\sum_{ij} \log\left(P_{\theta}\{Y_{i < j}=y_{ij}\mid Y_{uv}=y_{uv}\text{ for all } u < v, (u,v)\neq(i,j)\}\right)\]
- Wasserman and Pattison proposed the current formulation of these
*Exponential Random Graph Models (ERGM)*, also referred to as \(p^*\) models, as a generalization of the Markov graphs of Frank and Strauss, which leads to likelihood functions of the form \[P_{\theta} \{Y=y\} = \exp \left(\theta^T u(y)-\phi (\theta) \right)\]

- Some nature extensions of the Erdos-Renyi-Gilbert model result in varying node degrees.
- Some examples

- Search for
*optimal partition*. In the sociometric literature this was known as blockmodeling. - The basic idea is that nodes that are heavily interconnected should form a block or community.
- The nodes are reordered to display the blocks down the diagonal of the adjacency matrix representing the network
- Moreover, the connections between nodes in different blocks appear in much sparser off-diagonal blocks.

Left: An example graph. Right: The corresponding blockmodel, where red nodes have been collapsed into the red block and similarly for the other colors.

- As a concrete example, consider the mixed membership stochastic blockmodel (MMB); the data generating process for a graph \(G = (\mathcal{N}; \mathcal{Y})\) is the following.
- For each node \(p \in \mathcal{N}\):
- Sample mixed membership \(\overrightarrow{\pi_p} \sim Dirichlet_K ( \overrightarrow{\alpha})\)
- For each pair of nodes \((p,q) \in \mathcal{N} \times \mathcal{N}\):
- Sample membership indicator, \(\overrightarrow{z}_{p \rightarrow q \sim mult_K(\overrightarrow{\pi_p})}\)
- Sample membership indicator, \(\overrightarrow{z}_{p \leftarrow q \sim mult_K(\overrightarrow{\pi_q})}\)
- Sample interaction, \(Y(p,q) \sim Bern(\overrightarrow{z}_{p \rightarrow q}^T B \overrightarrow{z}_{p \leftarrow q})\)
- Note that the group membership of each node is
*context dependent*. That is, each node may assume different membership when interacting or being interacted with by different peers. - Also note that the pairs of group memberships that underlie interactions need not be equal; this fact is useful for characterizing asymmetric interaction networks

- Inference in the blockmodelis challenging, as the integrals that need to be solved to compute the likelihood cannot be evaluated analytically.
- Likelihood is \[l(Y \mid \overrightarrow{\alpha},B) = \int_{\Pi}\int_{Z} P(Y \mid Z,B) P(Z \mid \Pi) P(Pi \mid \overrightarrow{\alpha}) dZ d\Pi\]
- While the inner integral is easily solvable, the outer integral is not. Exact inference is thus not an option.
- To complicate things, the number of observations scales as the square of the number of nodes, \(O(N^2)\).
- Sampling algorithms such as Monte Carlo Markov chains are typically too slow for real-size problems in the natural, social, and computational sciences.
- Airoldi et al. suggest a nested variational inference strategy to approximate the posterior distribution on the latent variables, \((\Pi;Z)\). (Variational methods scale to large problems without loosing much in terms of accuracy

- Bickel and Chen brings new twists to the model-based approach of community discovery
- They use a blockmodel to formalize a given network in terms of its community structure.
- The main result of this work implies that community detection algorithms based on the modularity score of Newman and Girvan are (asymptotically) biased.
- It shows that using modularity scores can lead to the discovery of an incorrect community structure even in the favorable case of large graphs, where communities are substantial in size and composed of many individuals.
- This work also proves that blockmodels and the corresponding likelihood-based algorithms are (asymptotically) unbiased and lead to the discovery of the correct community structure.

- The latent space model was first introduced by Hoff et al. with applications to social network analysis, and has been recently extended in a number of directions to include treatment of transitivity, homophily on node-specic attributes, clustering, and heterogeneity of nodes.
- The conditional probability model for the adjacency matrix \(Y\) is \[P(Y \mid Z,X,\Theta) = \prod_{i \ne j} P(Y(i,j) \mid Z_i, Z_j, X_{ij}, \Theta)\]
- where \(X\) are covariates, \(\Theta\) are parameters, and \(Z\) are the positions of nodes in the low dimensional latent space.
- Each relationship \(Y(i,j)\) is sampled from a bernoulli distribution whose natural parameter depends on \(Z_i,Z_j,X_{ij}\) and \(\Theta\)
- The log-odds ratio is then: \[\log \frac{P(Y(i,j))}{1-P(Y(i,j))} = \alpha + \beta^{'} X_{ij} - \lvert Z_i - Z_j \lvert \equiv \eta_{ij}\] and the corresponding log likelihood is \[\log P(Y \mid \eta) = \sum_{n \ne m }(\eta_{ij} \cdot Y_{ij} - \log (1+\exp\{\eta_{ij}\}))\]

- One can easily extend the latent space modeling approach to weighted networks.
- the error model \(P(Y_{ij})\), i.e., the model for the observed edge weights with mean \(\mu_{ij} = E(y_{ij})\)
- the linear model \(\eta_{ij}=\eta_{ij}(\beta,Z_i,Z_j)\)
- the link function \(g(\mu_{ij}) = \eta_{ij}\)
- Binary graph example
- Genaral case