DescriptionAugust 19, 2019
The permutation test is an often used test procedure for determining statistical significance in brain network studies. Unfortunately, generating every possible permutation for large-scale brain imaging datasets such as HCP and ADNI with hundreds of networks is not practical. We present the novel transposition test that exploits the underlying algebraic structure of the permutation group in drastically speeding up the permutation test. If you are using either data or code below, please reference Chung et al. (2019), where the method is first published. The transposition can be further used in the rapid computation of twin correlations, which is traditionally computed using the intra-class correlations.
The package will show how to perform the transposition test, which generates 100-1000 (depending on the sample sizes) more permutations than the standard permutation test in the same computational run time. The complete package is zipped into matlab.zip. The code SIMULATION.m will duplicate the simulation studies given in . Due to random nature simulation, the results vary. However, the codes can handle vector data. For more complex data such as images and graphs/networks, you need to vectorize the data first. The power point presentation of the method is given here.
1) Permutation test
August 19, 2019; December 5, 2021
For data vectors x and y, we define the test statistic as a function using the built-in inline.m function:
stat = inline('max(abs(mean(x)-mean(y)))').
This is the test statistic used in the simulation study of common 1-cycle basis in study . For two-sample t-stat used in study , we define
stat = inline('mean(x)-mean(y)).*sqrt(m*n*(m+n-2)./((m+n)*((m-1)*var(x)+(n-1)*var(y))))');
The standard permutation test is then performed by running
[stat_s, time_s] = test_permute(x,y,per_s, stat)
where per_s is the number of permutations. The function inputs function as an augment. The resulting statistical result is stored in stat_s. The run time of the test is stored in time_s.
Figure 1. Simulation result showing superior performance of the transposition test over the standard permutation test. Read Chung et al. (2019) for details.
2) Transposition test
August 19, 2019
The transposition test is done by running
[stat_t, time_t] = test_transpose(x,y,per_t, stat)
where and per_t is the number of permutations. The resulting two-sample t-stat result is stored in stat_t. The run time of the test is stored in time_t. test_transpose.m starts the sequence of random transpositions from the permutation of [x y].
Accelerating the convergence of the transposition testAugust 19, 2019
Chung et al. (2019) explained how to accelerate the convergence and avoids possible bias by interjecting the random permutations within the sequence of transpositions. That can be easily achieved by running iterations
[stat, time] = test_transpose(x,y,per_t/1000);
where the sequence of transpositions of length per_t are broken into 1000 independent sequences. The number of sequences needs to be determined empirically but the length of each sequence should be around 100-1000.
Transposition test on the whole brain or network
August 19, 2019
The transposition test on the whole brain or network can be done by vecotrizing data. Suppose we generated the following simulated vector data, where group 1 has m=10 subjects and group 2 has n=12 subjects over l=1000 voxels.
m=10; n=12; l=1000; per_t=100000;
Then the transposition test is done as
[stat_t, time_t] = test_transpose(x,y,per_t);
The p-value is then computed as
pvalue_t = online_pvalue(stat_t, observed);
3) Twin correlation computation using
June 03, 2020
The above transposition method can be further used
to rapidly compute the twin correlations, which can
be a serious computational bottleneck for large
sample sizes. The method is illustrated with a
simple example. Consider 5 twins, where each twin is
paired along the rows.
The twin correlation based on 5000 transpositions is given by running
[stat_t, time_t] = transpose_corr(x,y,per_t)
where time_t is the run time and stat_t is the average correlation of all the correlations over transpositions. It is sequentially computed in the online fashion to save computer memory. Note the length of stat_t is the number of transpositions. per_t. stat(5000) is the average correlation over 5000 transpositions. Since the result depends on random transpositions, the results randomly change. Thus, it is necessary to average the results. The number 5000 is empirically chosen from the plot (Figure 2). Empirically we choose per_t such that the estimated the correlation values does not change in 2 decimal places,
In this example, we performed the simulation 10 times and the results are averaged. The standard deviation of 10 simulation should be smaller 0.01. This grantee the convergence within two decimal places in average.
[stat_t, time_t] = transpose_corr(x,y,per_t);
hold on; plot(stat_t);
Figure 2. Correlation estimation using 5000 random transpositions. 5000 is chosen since there is no more fluctuations and it is a large enough round number. The result needs to be averaged to get more robust estimate.
In the above code, stat_t should converge to the true underlying twin correlation as per_t increases. transpose_corr.m can input matrix data, where x and y can be the input data of size m (number of subjects) by p (number of variables/connections/voxels per subject). For instance, it will input 5x3 data matrices:
[stat_t, time_t] = transpose_corr(x,y,per_t);
Heritability and its p-value computation using transpositions
June 11, 2020
Given the following data, heritability index (HI)
is computed as follows. Assume there are 50 MZ-twins
(xM and yM paired) and 40 DZ-twins (xD and yD
xM=0.1 + normrnd(0,1,50,1);
yM= xM + normrnd(0,0.1,50,1);
xD=0.1 + normrnd(0,1,40,1);
yD= xD + normrnd(0,1,40,1);
Based on transpose_corr.m,
heritability index (HI) is computed as follows
HI = 2*(corr_MZ-corr_DZ)
which gives HI value of 0.8537 indicating strong
heritability. This number will randomly
change. In this small sample example,
per_t=10000 gives sufficient convergence. HI-value
0.3833 is our observation. We will determine the
statistical significance of 0.3833 using the
empirical distribution of HI.
[all_t, time_t] = transpose_heritability(xM,yM,xD,yD, per_s);
where per_s is the total number of transpositions between MZ- and DZ-twins. 50% of transpositions are within twins. The optimal ratio will be later determined. all_t outputs the every possible HI-values over transpositions within and between twins. It is implemented following the formula sheet: transpose-covariance.pdf. per_s has to be extremely large for convergence. For HCP data, it is expected per_s is no less than 10 million for convergence. But this has to be determined empirically that gives sufficient convergence. One visual trick to determine if 100000 is big enough is by checking if the empirical probability distribution is smooth enough (Figure 3):
Figure 3. The histogram (unormalized empirical distribution) of all_t.
Then p-value is given by
pvalue = online_pvalue(all_t, HI)
where pvalue(1000) is the p=value computed using up to 1000 transpositions. pvalue(end) is the p-value using all the 100000 transpositions.
Figure 4. p-value plot of all_t over the number of transpositions. It shows, per_s=100000 is not enough. Likely 10 times more transpositions are needed.
Here, we have shown how to compute uncorrected
p-value for one variable/connection per subject. It
should not be hard to extend it and compute
the corrected p-value for multiple
variables/connections per subject.
 Chung, M.K., Xie, L, Huang, S.-G., Wang, Y., Yan, J., Shen, L. 2019 Rapid acceleration of the permutation test via transpositions, International Workshop on Connectomics in NeuroImaging, Lecture Notes in Computer Science (LNCS) 11848:42-53. An earlier version with different application arXiv:1812.06696
 Anand, D.V., Dakurah, S., Wang, B., Chung, M.K. 2021 Hodge-Laplacian of brain networks and its application to modeling cycles. arXiv:2110.14599