Erdos-Renyi Random graphs.

An Erdos-Renyi (ER) graph on the vertex set \(V\) is a random graph which connects each pair of nodes {i,j} with probability \(p\), independent. This model is parameterized by the number of nodes \(N = |V|\) and \(p\). There are many classical results for this model. See Durrett’s book Random Graph Dynamics (RGD) for a more thorough treatment. We are particularly interested in “sparse” ER graphs. To make the graph sparse, \(p\) must decrease as \(N\) gets large. Define \(\lambda = Np\) to be the expected degree of a node. Most result for ER graphs suppose that this is fixed (e.g. \(\lambda = 1\) or \(\lambda = 2\)) or that it grows slowly (e.g. \(\lambda \approx \log N\)).

Summary Statistics of ER graphs: the Degree distribution

Fact: For a fixed \(\lambda\), \(Binomial(N,\lambda/N) \rightarrow Poisson(\lambda)\) as \(N\) gets large. Here, convergence is in distribution.

So, for large graphs, the degree distribution of the ER graph is \(Poisson(\lambda)\).

library(igraph)
gs = list()
# sample 100 ER graphs with growing lambda values
lamseq = seq(.5,10,len = 100)
for(i in 1:100) gs[[i]] = erdos.renyi.game(10000,lamseq[i]/10000)  
hist(degree(gs[[100]]))

##### Summary Statistics of ER graphs: the Triangles

Let \(T\) be the number of triangles in the graph. Let \(E(T)\) be the expected number of triangles under an ER graph. \[ \begin{aligned} E(T) &= E(\sum_{i\ne j\ne k\ne i} 1(i \sim j, j\sim k, k \sim i)) \\ &= \sum_{i\ne j\ne k\ne i} P(i \sim j, j\sim k, k \sim i)) \\ &= \sum_{i\ne j\ne k\ne i} p^3 \\ &= {N \choose 3}p^3 \approx N^3 (\lambda/N)^3 \\ &= \lambda^3. \end{aligned} \] Is this big or small in sparse graphs?

trans = rep(NA, 100)
for(i in 1:100) trans[i] = transitivity(gs[[i]])  # compute transitivity ratio
plot(lamseq, trans) # no triangles!!! look at y-axis!

Exercise: Suppose we were interested in some other type of motif (beyond a triangle). How would you compute the expected number?

Branching processes

Probabilists have described many features of the ER graphs. One of the most celebrated results is the existence of a “giant component” when \(\lambda>1\). From RGD, “The intuition behind this result is that a site has a Binomial\((N − 1, \lambda/N)\) number of neighbors, which has mean \(\approx \lambda\). Suppose that we start [”searching the random graph" at vertex \(I_0 ={1}\), and for \(t\ge 1\) let \(I_t\) be the set of vertices not in \(\cup_{s = 0}^{t−1} I_s\) that are connected to some site in \(I_{t-1}\). Then when \(t\) is not too large the number of points in \(I_t\), \(Z_t = |I_t|\), is approximately a branching process …" where conditioned on \(Z_t\), the expectation of \(Z_{t+1}\) is \(\lambda Z_t\).

The following section should be thought of “independently” of random graphs. A branching process is a model for how (asexual) organism procreate. See Athreya and Ney (1972) for a classical reference. The second chapter of RGD provides a summary.

Definition: A branching process with \(s\) seeds is sampled with the following steps:
1) \(s\) seeds form generation zero.
2) Each participant from the last generation creates a random number of other individuals \(\{0,1,2,\dots\}\) who join the next generation.
3) Step two iterates indefinitely, or until there is a generation with population zero.

A realization from this model is a “tree”. For example, see here.

In a branching process, the number of individuals referred by a participant is a random number that is drawn from an offspring distribution.

Definition: Suppose that person \(i\) refers \(\xi_i\) individuals. The offspring distribution for person \(i\) is the probability distribution of \(\xi_i\), \[p(k) = Pr(\xi_i = k).\]

Definition: The Galton-Watson branching process (GWP) is a branching process for which each individual’s number of referrals is an independent draw from a common referral distribution, i.e. \(\xi_i\) are iid.

It is well known that the macroscopic behavior of the branching process is controlled by the expected value of the referral distribution. For example, in the spread of a contagious disease, we hope that the average infected individual passes the disease to less than one person (on average).

Definition: Let \(\xi\) be a draw from the referral distribution. Using \(E(\xi)\) to denote the expected value of \(\xi,\) \[\mu = E(\xi) = \sum_i i p(i).\]

Define \(Z_t\) to be the number of individuals in generation \(t\).

Fact: Under the GWP, \(W_t = Z_t/\mu^t\) is a Martingale.

Definition: A tree goes extinct if any generation has population zero, i.e. \(Z_t = 0\) for any \(t\) sufficiently large.

Exercise: Denote \(q\) as the probability of extinction for a tree sampled from the GWP with one seed node. If \(\mu\le1\), then \(q=1\); that is, the tree will (eventually) go extinct with probability one if \(\mu\) is not greater than one.

Even when \(\mu>1\) the probability of extinction can still have positive probability. For example, a seed could provide zero offspring. This observation gives a simple lower bound on the extinction probability.

Fact: A tree sampled from the GWP could go extinct in the first generation, or in a future generation. So, \[q\ge Pr(\mbox{seed node refers zero participants}) = p(0).\]

To compute \(q\) exactly, let \(\phi(z) = E(\xi^z)\) be the probability generating function for the offspring distribution.

Theorem: \(q\) is the unique stationary point of the probability generating function, \(\phi(q) = q\).

Example: For \(\xi \sim Poisson(\lambda)\), this fixed point equation is \(q = exp(\lambda(q-1))\).

Theorem: [Kesten-Stigum Theorem] Suppose \(\mu>1\). \(P(\lim W_t =0)<1\) if and only if \(E(\xi \log \xi) < \infty\).

Summary statistics of ER graphs: A giant component

A GWP is not an exact representation for a “breadth first search” over an ER graph. Why? How does this related to the expected number of triangles in an ER graph? Tightening the GWP results to extend to the ER model takes some work (see Section 2.3 in RDG if you are curious for more details).

To draw the analogy between ER graphs and the GWP, think of \(\lambda\) from the ER model to be like \(\mu\) from the GWP.

Theorem: [Theorem 2.3.2 in RGD]. Suppose \(\lambda > 1\). There is a constant \(\beta\) so that with probability converging to 1, there is only one component of the ER random graph with more than \(\beta \log N\) vertices. The size of this component is of the order \((1 − q(\lambda))N\) where \(q(\lambda)\) is the extinction probability \(q\) for the Poisson(\(\lambda\)) branching process.

Theorem: [Theorem 2.8.1 in RGD] Suppose \(\lambda = a \log N\), for some \(a\). The probability that the entire graph is connected tends to 0 if \(a < 1\) and to 1 if \(a > 1\).

clust = matrix(0, nrow = 100, ncol = 3)
for(i in 1:100){
  cl = clusters(gs[[i]])
  clust[i,] = -sort(-cl$csize)[1:3]
}
plot(lamseq,clust[,1], type = 'l', lwd= 3, log = "y")
lines(lamseq,clust[,2], type = 'l', lwd=2)
lines(lamseq,clust[,3], type = 'l')

Summary statistics of ER graphs: Pairwise distances

You can also use the GWP to study random pairwise distance in the ER graph.

Theorem: [Theorem 2.4.1 in RGD] Suppose \(\lambda > 1\) and pick two nodes \(x\) and \(y\) at random from the giant component. Then \(d(x, y)/ \log N \rightarrow 1/ \log \lambda\), where \(\rightarrow\) is convergence in probability.

From RGD, “The answer in Theorem 2.4.1 is intuitive. The branching process approximation grows at rate \(\lambda^t\), so the average distance is given by solving \(\lambda^t = N\), i.e., \(t = (\log N)/ \log \lambda\).” Or, \(t = \log_\lambda N\).

avleng = rep(NA,100)
subs = seq(10,100, by = 10)
for(i in subs) avleng[i]=average.path.length(gs[[i]])
# this should level off at 1 "asymptotically" in N =10000.
plot(lamseq[subs], avleng[subs]/(log(10000,lamseq[subs])))

# not clear from igraph documentation what it is doing for disconnected components..

To interpret this result, in sparse graphs, \(\lambda\) does not change with \(N\). So, \(\log \lambda \approx\) constant. So, \(d(x,y) =O(\log N)\); pairwise distances grow logarithmically in the size of the graph. Is this fast or slow? What if \(\lambda = \log N\)? While the diameter of the graph has different constants and these change in interesting ways from the “average” pairwise distance, it can be shown that the diameter of the giant component is still of order \(\log N\).

diam = rep(NA,100)
subs = seq(10,100, by = 10)
for(i in subs) diam[i]=diameter(gs[[i]])
# for disconnected graph, returns diameter of largest connected component.
plot(lamseq[subs], diam[subs])

Exercise: Let \(A\) be an adjacency matrix for an undirected, unweighted graph with no self-loops. Let \(\lambda_i(A)\) be the \(i\)th eigenvalue of \(A\). Let \(|E|\) be the number of edges in \(A\) and let \(T\) be the number of triangles in \(A\). Show that the first three “moments” of the eigenvalues have the following equivalences: \[ \begin{aligned} 0 &= \sum_{\ell = 1}^N \lambda_i(A) \\ |E| &= \sum_{\ell = 1}^N [\lambda_i(A)]^2 \\ T &= \sum_{\ell = 1}^N [\lambda_i(A)]^3. \end{aligned} \] These equalities do not depend on the ER model.