Many network models are inspired by the desire to recreate certain desired/realistic properties in the sampled graphs. “We know that graphs should have a long tailed degree distribution; how do we put this into a model?” OR “We know that graphs should have triangles; how do we put this into a model?”
Instead of recreating a new graph model for every new property, the exponential random graph model (ERGM) is a family of models that has the potential to recreate “any” property.
The exponential family is interesting and important far outside of the domain of networks. It is one of the basic concepts that is covered in advanced mathematical-statistics course. It is a set of probability distributions that can be expressed as \[f_X(x\mid\boldsymbol \theta) = h(x) \exp\left(\sum_{i=1}^s \eta_i({\boldsymbol \theta}) T_i(x) - A({\boldsymbol \theta}) \right)\] for some parameter vector \(\theta\), “base measure” \(h(x)\), “natural parameter” \(\eta\), sufficient statistic \(T\), and log-partition function \(A\). Many (most?) familiar distributions belong to this family (e.g. Normal, Binomial, Poisson, Gamma, etc).
Exponential families are used in Generalized Linear Models (GLMs), finding uniformly most powerful tests, obtaining conjugate priors. For example, in GLMs, \(\eta_i\) is the \(\eta\) for \(Y_i\) and \(\eta_i = x_i'\beta\) (there is also a link function + other important details).
The sufficient statistic(s) are important because they completely characterize the likelihood \(\Pi_{\ell = 1}^N f_X(x_\ell \mid \theta)\) (this is a fun exercise for the exponential family if you haven’t done it!). The Rao-Blackwell theorem is helpful in emphasizing the importance of the sufficient statistics; sufficient statistics are important beyond the exponential family. In general, it can be hard to find the sufficient statistics for a given distribution. However, if it belongs to an exponential family, then you have a recipe.
Pedagogically, when we begin studying the exponential family, we often start with a distribution (e.g. Normal) and then derive its \(h, \eta, T, A\). We can show this for lots of other distributions as well. It generalizes several distributions. It is a generalization in the best sense because it is often easy to show things “in general” for all distributions in the exponential family.
The exponential family can be motivated in another way. It is the distribution with the largest entropy, subject to the constraint: \(E(T(X)) = \alpha\) for some function \(T\).
\[ max_{P} \mbox{entropy}(P) \ \ \mbox{ subject to } E_P(T(X)) = \alpha\] The solution to this problem is the exponential family with sufficient statistic \(T\); \(\eta\) is chosen to satisfy \(E(T(X)) = \alpha\). Read more here and here.
The Configuration Model says that we should sample uniformly from all graphs with the desired the degree distribution. If we care about triangles, then we could “extend” this model to sample from all graphs with the desired degree distribution AND the desired number of triangles. This is a bit awkward/difficult to implement. If we only care about the degree distribution, Chung and Lu relaxed the Configuration Model into a model that has the same degrees in expectation.
How do we do this for triangles? How do we find a distribution that has the same expected number of triangles? Perhaps we should find the distribution with the largest entropy (i.e. the most “uncertainty”) that satisfies our desires in expectation. This is exactly what the ERGM does.
In ERGMs, the random variable is not “\(X \in R\)”. Instead, it is a graph \(G\). ERGMs choose functions \(T\) to (for example) count the number of triangles, the number of k-stars, and any other motif of interest. Then, the graph is generated from the distribution with maximum entropy such that we match the observed number of triangles and k-stars (for example) in expectation.
You could ask, if a model has been around for so long and there are so many problems, why are we still studying it. Alternatively, you could ask how we could make improvements on these points.