Tutorial for Vintage Sparse PCA (aka PCA + Varimax).

Given a matrix $A \in R^{n \times d}$, Vintage Sparse PCA (vsp) is a way of estimating interpretable latent factors for the rows and columns of $A$. This is a unified way of estimating a broad class of different multivariate models. The manuscript that proposes and studies this technique is here. This document is more of a tutorial.

vsp with $A$ estimates three matrices, $\hat Z, \hat B, \hat Y$. The “row factors” are in a matrix $\hat Z \in R^{n \times k}$ and the “column factors” are in a matrix $\hat Y \in R^{d \times k}$. Then, there is a “middle $\hat B$ matrix” that is $ k k$. With these matrices, $A \approx \hat Z \hat B \hat Y^T$.

For example, if the rows of $A$ form $k$ different clusters, then the $k$ columns of $Z$ will be “cluster indicators” ($\hat Z_{ij}$ is large if $i$ is in cluster $j$).

At first, the middle $\hat B$ matrix is confusing. So, from now on, we will presume that the primary focus is on the rows and we will define $\tilde Y^T = \hat B \hat Y^T$. Then, $A \approx \hat Z \tilde Y^T$. While $\hat Y$ might have been sparse, $\tilde Y$ might not be, but that is ok.

We will illustrate vsp with two examples. First, on the text of realdonaldtrump tweets. This example demonstrates one way of converting text into a (sparse) document-term matrix $A$. Second, with a graph of citations among academic journals. This example demonstrates one way of converting a vertex set and edge list into a (sparse) adjacency matrix $A$. Then, we will contextualize the clusters/factors with bff.

These are “sparse” matrices because the vast majority of the entries are zero (this is because the graphs are sparse!). We use library(Matrix) in R to represent sparse matrices. In short, it makes the algorithms really fast because it only “stores” the non-zero entries. We will talk more about it later.