#### Exact Topological Inference (ETI) for Brain Networks and Molecules

(c) 2019 Moo K. Chung

### Description

April 24, 2019; Sept 24, 2021

The method does the exact topological inference and computes the exact probability of how two brain networks or two molecules are different. For molecules, it is demonstrated using the spikes of COVID-19 virus [13]. For brain networks, it is demonstrated using twin study [4,5,6]. If you are using codes or data below, please reference these papers.

### Exact Topological Inference for spike proteins of  Covid-19 virus

Sept 24, 2021

The full data and codes used in [13] is packaged as matlab-ETI-covid19.zip. Here is explanation on how to use it. The coordinates of atoms in the B-domain (both open and closed states) of spike protein of COVID-19 virus are loaded as

The pairwise Euclidean distance between atoms in the molecules (Figure 1) are computed using pdist.m and make it into the square matrix form.

pairwise_B_closed = pdist(B_closed);
distances_B_closed= squareform(pairwise_B_closed);

pairwise_B_open = pdist(B_open);
distances_B_open= squareform(pairwise_B_open);

Figure 1. COVID-19 virus: Closed (6VXX) and open state (6VYB) of spike proteins (red pyramid shape). 6JX7 is feline corona virus.

The distance matrices are feed into Ripser package (arXiv: 1908.02518) and output is saved into text file, which we can read

[x1_open y1_open] =

where (x1_closed, y1_closed), (x1_open, y1_open) are the scatter points of persistent diagram of 1D homology. The persistent diagrams of closed and open states are displayed in Figure 2

Figure 2. COVID-19 virus: Closed (6VXX) and open state (6VYB) of spike proteins. Although there is almost no change in 0D homology, there are some differences in 1D homology.

The obtained scatter points are then transformed to lattice path by running (Figure 3-left):
closed_lattice = PH_latticepath3(x1_closed,y1_closed);
figure; plot(closed_lattice,'--k', 'LineWidth', 2)
open_lattice =
PH_latticepath3(x1_open,y1_open);
hold on; plot(open_lattice,'-r', 'LineWidth', 2)

This is transformed to normalized step functions (Figure 3-right):
[closed_phi,open_phi, x, d]= PH_mono2(closed_lattice, open_lattice);
figure; plot(x,closed_phi, '--k', 'LineWidth', 2)
hold on; plot(x,open_phi,'-r', 'LineWidth', 2)

The p-value of the normalized step function difference is given by (Theorem 3 in

[13]):
P = PH_Aasym2([length(closed_lattice), length(open_lattice)] ,d)

Figure 3. Lattice path representation of persistent diagrams. The lattice paths are transformed to normalized step functions that look and behave like cumulative distribution functions.

### Exact Topological Inference for functional brain networks

July 22, 2020

Using the data, we will show how to construct Betti plots from weighted brain graphs, where the weights are Pearson correlations. Then show how to perform the exact topological inference and obtain the p-value on the Betti plots. The Betti plots are the basic data visualization technique in persistent homology but statistical inference procedure has been lacking in the field [1,2,3]. New exact topological inference framework is developed for determining statistical significance of Betti plots.

The codes have been tested with MacBook Pro with MATLAB 2016. If you are using the dataset or codes, please reference [5] or [6]. The complete package is zipped into matlab.zip, which was used in [6] and used in [4.5] as well. The ETI can be done using MAINSCRIPT.m. The package has connectivity.mat containing the correlation matrices of the MZ- and DZ-twins (Figure 1).

Chung et al. 2019.

Figure 1. MZ- and DZ-correlation matrices of resting-state fMRI. The difference in the connectivity matrices heritability index (HI).

The resting-state functional brain network given here  follows the sequence of image processing pipelines in Chung et al. 2019 [6]. It is the average functional network of 416 normal subjects and considered as the representative functional brain networks at rest when people are not doing any task. Even then the brain exhibits the residual connectivity pattern. Decades ago, it is treated as artifacts but now it is extensively studied. Numerous studies including Chung et al. 2019 showed that this residual connectivity pattern is heritable. The average functional network pattern is stored in networkfMRI.mat, which stores the strength of connectivity between 116 regions using Pearson correlations. The codes below display network above correlation 0.7, which shows a lot of connections between hemisphere (Figure 2).

figure;
figure_patch(t,[0.99216     0.91765     0.79608], 0.05);

Figure 2. Average functional connectivity of 416 normal subjects. Only connections above correlation 0.7 are shown. They are highly symmetric across hemispheres. Although there are many square shaped networks with 4 nodes where two nodes are on each hemisphere.

### Betti Plots

April 24, 2019

The Betti plot displays Betti numbers over the network filtration values. The 0th Betti plot (number of connected components) is constructed by identifying the minimum spanning tree via the Kruskal's algorithm in PHbarcode.m. See [7] for the details on the algorithm. PHbarcode.m inputs the connectivity matrix and outputs the sequence of increasing filtration values that increases the number of components. The Betti plot is then constructed by simply plotting the number of connected components over the increasing filtration value.

For the 1st Betti plot (number of cycles), we used a new algorithm based on the Euler characteristic that counts the only the algebraically independent cycles  [5, 6, 12]. PH_betti3.m computes the Betti plots for both 0th and 1st Betti numbers. Given connectivity matrix C, it computes the Betti numbers at given filtration values (thresholds) as follows.

thresholds=[0:0.01:1]
[beta0, beta1, biggest0, biggest1] = PH_betti3(C, thresholds)
figure; plot(thresholds,beta0); hold on;
plot(thresholds,beta1);

The computed Betti numbers are plotted in Figure 3.

Figure 3. Betti-0 (left) and Betti-1 number plots over thresholded correlation values.

### Exact topological inference

May 13, 2019

The KS-distance on Betti plots was introduced in [5,6,8]. The KS-distance is simply given by taking the maximum between two Betti plots obtained from PH_betti3.m. The exact topological inference (ETI) was introduced in [5,6,8,9] to determine the statistical significance of the KS-distance. The statistical significance as measured in terms of p-value is computed by enumerating every possible permutation combinatorially by mapping the sample space to the space of every possible path in grids (Figure 4). The proof is given in [8,9]. p-value can be computed using PH_A.m for the number of filtration values smaller than 100. If the number of filtration values are larger than 100,  use the asymptotic formula  PH_Aasym.m. Given Betti-0 plots betaMZ0 and betaDZ0, the p-value is computed as
:

d0 = ceil(d0)
q0 = size(C,1)
P  = PH_Aasym(q0,d0)

Given Betti-1 plots betaMZ1 and betaDZ1, the p-value is computed as:

d1 = ceil(d1)
q1= 1 + (q0^2-q0)/2 - q0;
P  = PH_Aasym(q1,d1
)

q0 is the total number of nodes, which is the maximum possible number of Betti-0 number. q1 is the maximum possible number of cycles computed from Euler formula.

Figure 4. Probability distribution on KS-distance is computed combinatorially. See [8,9] for mathematical detail.

### Random network simulations with modular structures

August 21, 2020

Random network simulations are often needed for validation in publication [6,8,10]. Here we show how to generate random graphs with modular structures which give block structures in the correlation matrices. Here we generate two groups of networks (n=5, m=5 subjects each) with p=30 nodes. Noise amount sigma=0.1 is added. The networks with different modular structure is generated from
RG_module (A,k,sigma), where A is the baseline data matrix, k is the number of modules and sigma is the amount noise we adding to the simulation. The result is displayed in Figure 5.

n=5; m=5;
p=30;
A = normrnd(0, 1, n,p);

sigma =0.1;
[X C1] = RG_module(A,2,sigma);
[Y C2] = RG_module(A,3,sigma);

figure; subplot(1,2,1); imagesc(C1); colorbar
subplot(1,2,2);  imagesc(C2); colorbar

Figure 5. Random correlation matrix with two (left) and three (right) modules. Since they have different number of modules, they are topologically different. This is considered as the mathematical ground truth in simulation. Any topological method should able to differentiate them.

This simulation is saved in SIMULATION-modules.m. The full simulation study given in [6] can be duplicated running SIMULATION.m. Since it is a random simulation, it is expected the results will be fluctuating but you should end up with similar results.

### Reference

[1] Chung, M.K., Lee, H., Arnold, A. 2012. Persistent homological structures in compressed sensing and sparse likelihood, NIPS Workshop on Algebraic Topology and Machine Learning.

[2] Chung, M.K., Hanson, J.L., Lee, H., Adluru, N., Alexander1, A.L., Davidson, A.L., Pollak, S.D. 2013. Persistent homological sparse network approach to detecting white matter abnormality in maltreated children: MRI and DTI multimodal study, 16th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI).  Lecture Notes in Computer Science (LNCS) 8149:300-307

[3] Chung, M.K., Hanson, J.L., Ye, J., Davidson, R.J. Pollak, S.D. 2015 Persistent homology in sparse regression and its application to brain morphometry. IEEE Transactions on Medical Imaging, 34:1928-1939

[4] Huang, S.-G., Gritsenko, A., Lindquist, M.A., Chung, M.K. 2019. Circular Pearson correlation using cosine series expansion. IEEE International Symposium on Biomedical Imaging (ISBI)
1774-1777

[5] Chung, M.K., Huang, S.-G., Gritsenko, A., Shen, L., Lee, H. 2019 Statistical inference on the number of cycles in brain networks. IEEE International Symposium on Biomedical Imaging (ISBI)
113-116
MATLAB

[6] Chung, M.K, Lee, H., Ombao, H., Solo, V. 2019. Exact topological inference of the resting state brain networks in twins, Network Neuroscience
3:674-694 MATLAB

[7] Lee, H., Kang, H.K., Chung, M.K., Kim, B.-N., Lee, D.S. 2012. Persistent brain network homology from the perspective of dendrogram, IEEE Transactions on Medical Imaging. 31:2267-2277

[8] Chung, M.K., Vilalta, V.G., Lee, H., Rathouz, P.J., Lahey, B.B., Zald, D.H. 2017 Exact topological inference for paired brain networks via persistent homology. Information Processing in Medical Imaging (IPMI) 10265:299-310

[9] Chung, M.K., Luo, Z., Leow, A.D., Alexander, A.L., Richard, D.J., Goldsmith, H.H. 2018 Exact Combinatorial Inference for Brain Images, Medical Image Computing and Computer Assisted Intervention (MICCAI), 11070:629-637

[10] Chung, M.K., Lee, H., Solo, V., Davidson, R.J., Pollak, S.D. 2017  Topological distances between brain networks, International Workshop on Connectomics in NeuroImaging, Lecture Notes in Computer Science 10511:161-170

[11]
Gritsenko, A, Lindquist, M.A., Kirk, G.R., Chung, M.K. 2018 Automatic identification of twin zygosity in resting-state functional MRI, arXiv:1807.00244

[12] Lee, H., Chung, M.K., Kang, H., Lee, D.S. 2014. Hole detection in metabolic connectivity of Alzheimer's disease using k-Laplacian, 17th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). Lecture Notes in Computer Science (LNCS) 8675:297-304

[13]
Chung, M.K., Ombao, H. 2021 Lattice paths for persistent diagrams. LNCS 12929:77-86 arXiv:2105:00351.