{Exchangeable / graphon / eigen / latent space / random dot product / stochastic block} models

A broad class of statistical models have edges which are conditionally indpendent. The variables that must be conditioned on are latent or unobserved variables. There are two separate motivations for this model. Aldous and Hoover use the notion of infinite exchangeable graphs, while Lov{'a}sz and Szegedy use the notion of graph limits.

Graph limits (2006)

For a “motif” \(F\) and an ambient graph \(G\), let \(hom(F,G)\) denote the number of homomorphisms of \(F\) into \(G\) (i.e. edge preserving maps of \(V(F) \rightarrow V(G)\)). Normalize this to get the homomorphism density, \[t(F,G) = \frac{hom(F,G)}{|V(G)|^{|V(F)|}}.\] Pick a random mapping \(V(F) \rightarrow V(G)\). \(t(F,G)\) is the probability that the random mapping is edge preserving.

Let \(G_n\) be a sequence of graphs (not necessarily random!).

Definition Let \(f(F)\) be a real valued function on every motif \(F\); \(f\) could be called a “graph parameter”. A sequence of dense graphs \(G_n\) converges to graph parameter \(f\) if \(t(F,G_n) \rightarrow f(F)\) for every motif \(F\).

Let \(\mathscr{T}\) denote the set of graph parameters \(f\) for which there is a convergent sequence of graphs.

Theorem[Lovasz and Szegedy]
If \(f \in \mathscr{T}\), then there exists a symmetric measureable function \(W: [0,1]^2 \rightarrow [0,1]\) for which \(f(F) = t(F,W)\), where if \(|V(F)| = k\), \[t(F,W) = \int_{[0,1]^k} \prod_{ij \in E(F)} W(x_i, x_j) dx_1 \dots dx_k.\] The converse is also true; for any symmetric \(W\), define \(f(.) = t(.,W)\), then there exists a sequence of graphs \(G_n\) that converge to graph parameter \(f\).

Here is how mathematicians think about graphons.

How can we extend this notion to sparse graphs?

Exchangeable graphs (Hoover in 1979, Aldous in 1981)

These notes follow the second page of Bickel and Chen, 2009.

A sequence of real valued random variables \(X_1, X_2 \dots\) is exchangable if its distribution does not change for any re-ordering. de Finetti’s Theorem shows that if an infinite sequence of random variables is exchangeable, then there exists some (latent) random variable such that conditional on this variable, the sequence is iid. The work of Aldous and Hoover extends this notion to infinite exchangeable graphs.

Let \(G\) be a random graph on a countably infinite number of nodes. For a specific infinite graph \(g\), let \(\sigma(g)\) be a relabeling of the vertices by a permutation \(\sigma\). Suppose that the distribution of the random graph \(G\) is such that \[P(G=g) = P(G=\sigma(g))\] for all \(\sigma\) and all \(g\). A distribution on infinite graphs that has this property is called infinitely exchangeable.

If a distribution is infinitely exchangeable, then there exists an alternative representation of the distribution. For \(i,j \in 1, 2, \dots\), let \(\alpha, \xi_i, \xi_j, \lambda_{ij}\) be independent Uniform[0,1] random variables. For any infinitely exchangeable distribution, there exists a function \(g\) such that the adjacency matrix can be represented as \[A_{ij} = g(\alpha, \xi_i, \xi_j, \lambda_{ij}).\] \(\alpha\) is unidentifiable, \((\xi_i, \xi_j)\) are the interesting statistical bits, and \(\lambda_{ij}\) determines the coin flip for the \(i,j\)th element. This can be naturally parameterized as \[w(u,v) = P[A_{ij} = 1 | \xi_i = u, \xi_j = v].\]

The function \(w\) does not uniquely determine the distribution. However, if \(w_1\) and \(w_2\) define the same distribution, then there exists a measure preserving transformation \(\phi:[0,1]\rightarrow[0,1]\) such that \(w_1(u,v) = w_2(\phi(u),\phi(v))\).

The function \(W\) should be interpreted as the function \(w\). As far as I know, these two theories were developed completely independently. Moreover, Aldous and Hoover developed the “exchangeability” theory independently of each other.

Latent space model (2002)

This model was proposed by Hoff, Raftery, and Handcock. Here again, I believe that it was developed independently of Aldous, independently of Hoover, and independently of Lovasz and Szegedy.

The latent space model says that each actor \(i\) in the social network is assigned to a latent position \(z_i\) and that conditional on these latent positions, all edges are conditionally independent, \[P(A|z_1, \dots, z_n) = \prod_{i \ne j} P(A_ij | z_i, z_j).\] This model can also incorporate observed covariates \(x_{ij} \in R^d\). For example, \[logit(P(A_ij | z_i, z_j, x_{ij})) = x_{ij} \beta + d(z_i,z_j) \gamma, \] where \(d\) is some distance in the latent space. For example, perhaps the latent variables are sampled from a mixture of normal’s in two dimensions. Then, \(d\) could be the euclidean distance or the \(\ell_1\) distance between the points. The original paper proposes estimating the distance matrix \(D\), \(D_{ij} = d(z_i, z_j)\), without specifying the latent space of the choice of metric.

Random dot product model (2006)

For \(i = 1, \dots, n\), sample a vector \(z_i\) from some distribution \(F\). In the random dot product model (RDPM), \[P(A_{ij} = 1 | z_i, z_j) = f(z_i^T z_j)\] for some (“link”) function \(f\).

In one sense, this is a type of latent space model. However, I have found that when I say “the latent space model”, people assume that I am talking about modeling with “distances”; I’d hope to only use it to refer to the conditional independence relationship, but perhaps we should reserve this for “The Aldous-Hoover” representation.

If you assume that \(f\) is the identity function and assume a dense limit, then there is a CLT results for the elements of the eigenvectors of the resulting adjacency matrix.

This is a generalization of the Chung and Lu model, where the dimension of the vector \(z_i\) is one.

Eigenmodel (2007)

For node feature-vectors \(u_1, \dots, u_n\), \[P(A_{ij} = 1) = u_i^T \Lambda u_j\] for some diagonal matrix \(\Lambda\). This model was (obviously) developed independently of the RDPM. A key insight of the eigenmodel is that it can model any latent distance model, but the converse is not true.

The Stochastic Blockmodel (1983) and its variants

This is a type of RDPM, where the distribution \(F\) is only supported on \(K\) unique points.