These notes record the details of how the NARPS data is analysed, specifically the consensus statistic.

Correlated Meta Analysis: Homogeneous, Equally Weighted Analysis

Let \(Y_i\) be a \(N\) vector or T (or Z) scores at voxel \(i\), \(i=1,...,I\), where \(N\) is the number of results (teams) on a given hypothesis. For this set-up, we do not consider voxel-specific mean or variance, but just the image-wide mean,

\[E(Y_i) = \mu\mathbf{1}\] for scalar \(\mu\) and length-\(N\) ones vector \(\mathbf{1}\), and variance \[V(Y_i) = \sigma^2 \mathbf{Q} \] for scalar \(\sigma\) and \(N\times N\) inter-result correlation matrix \(\mathbf{Q}\).

The average statistic is \[\bar{Y_i} = \mathbf{1}^\top Y_i / N\] and its variance is \[V(\bar{Y_i}) = \sigma^2 \mathbf{1}^\top \mathbf{Q} \mathbf{1} / N^2\] Note the special case of compound symmetric correlation, were each result has correlation \(\rho\) with another result; for that case \[V(\bar{Y_i}) = \sigma^2(1+(N-1)\rho)/N,\] which shows in the extreme of perfect correlation \(\rho=1\) that the mean has the variance of one result, \(\sigma^2\), where as if there is no correlation at all we have the usual sample mean variance of \(\sigma^2/N\).

The traditional meta-analytic combining method would be to construct a test statistic as the sample mean divided by square root of the estimated variance \[T_i = \frac{\bar{Y_i}}{\sqrt{ \sigma^2 \mathbf{1}^\top \mathbf{Q} \mathbf{1} / N^2}}.\] This requires an estimate of \(\sigma\). If we estimated it based on a between-result variance, then we’d have a “random effects” analysis over results of the same data. However this is perhaps not the inference of interest.

To the extent that there is independent information in the different studies (i.e. \(\rho<1\)), the usual combining will provide a more precise estimate of the mean \(\mu\) and thus give us larger \(T\) values, reflecting the additional power. While more power is generally good, it seems like this would not be an ideal summary of the typical result.

Rather, we propose a “consensus analysis” that preserves the average image-wise mean and variance of the different studies results. To do this, we need to standardize \(\bar{Y_i}\) and re-scale and shift it to have the mean and variance of the original results. Call this \(T_{C,i}\) for consensus T-score at voxel \(i\), which we compute as: \[\begin{eqnarray} T_{C,i} &=& \frac{\bar{Y}_i-\hat\mu}{\sqrt{ \hat\sigma^2 \mathbf{1}^\top \mathbf{Q} \mathbf{1} / N^2}}\hat\sigma+\hat\mu\\ &=& \frac{\bar{Y}_i}{\sqrt{\mathbf{1}^\top \mathbf{Q}\mathbf{1} / N^2}}+\left(1-\frac{1}{\sqrt{\mathbf{1}^\top \mathbf{Q} \mathbf{1} / N^2}}\right)\hat\mu. \end{eqnarray}\] where \(\hat\mu = \frac{1}{N}\sum_k \hat\mu_k\) and \(\hat\sigma= \frac{1}{N}\sum_k \hat\sigma^2_k\), with \(\hat\mu_k=\frac{1}{I}\sum_i Y_{ik}\) and \(\hat\sigma^2_k = \frac{1}{I-1} \sum_i(Y_{ik}-\bar{Y}_{ik})^2\).

Correlated Meta Analysis: Heterogeneous, Weighted Analysis

Above (and in the NARPS analysis) we assumed that it is reasonable that all \(N\) studies have a common mean \(\mu\) and standard deviation \(\sigma\) (over the map). For reference, we provide an alternate weighted analysis, if this homogeneity assumption is untenable. To account for heterogeneity, we can conduct a standardized analysis where we create standardized data \[Z_{ik} = \frac{Y_{ik}-\hat\mu_k}{\hat\sigma_k}\] for voxel \(i\) study \(k\). The mean of these standardized values is \[\bar{Z}_i = \mathbf{1}^\top Z_{i}/N,\] for \(N\)-vector \(Z_i=\{Z_{ik}\}_k\), and then we can make a normalized consensus statistic \[T_{C,i} = \frac{\bar{Z}_i}{\sqrt{\mathbf{1}^\top Q \mathbf{1} / N^2}}\hat\sigma+\hat\mu,\] where we have again rescaled and shifted the statistic so it has the mean and variance (average) of the \(N\) results, but now have ensured that each and every study contributes equally to the consensus map.

Random Effects Variance of Homogeneous Analysis

A random effects meta-analysis considers the contribution of inter-study variation in the standard error of the computed effects. The between-study random effects variance is typically denoted \(\tau^2\). In this section we argue that in the NARPS “same data meta-analysis” setting, the between-study variance is purely about inter-team variation in processing, and cannot be attributable to different populations of sampled individuals. Thus, while a usual (different data) meta-analysis has a floor on the variance, that due to sampling variablity of each study, in this setting there is no floor and all observed variance is due to inter-study/team variance.

First, note that each, \(Y_{ik}\), is a standardized effect size, a T statistic, and is regarded as having sampling variance of 1.0. However, this “sampling variance” is with respect to repeated samples of subjects from the population, where as we only have a single sample of subjects from the population. Hence, the (population) sampling variance for this same data meta-analysis is zero. (As a thought experiment, consider sampling 54 values from a normal distribution with mean 0 and variance 1 and sending them to 70 research teams, asking them each to compute a 1-sample t-test. The 70 values that came back wouldn’t necessarily be identical – due to different operating systems and numerical libraries – but they would have a variance far less than 1, yet we know the sampling distribution of the 1-sample t-test has variance at least 1.)

So, while a traditional (different data) meta-analysis would find that the variance of T statistics is \(\mathrm{Var}(Y_{ik}) = 1 + \tau_i^2,\) in our same data meta-analysis setting, any variance observed over teams is the random effects variance, i.e. \(\mathrm{Var}(Y_{ik}) = \tau_i^2,\) where we consider computing a variance for each voxel \(i\).

We take care to differentiate two types of variation: Image-wise variation, \(\mu\) and \(\sigma\) (or \(\mu_k\) and \(\sigma_k\)), used above to make consensus maps, versus inter-study variation, \(\tau_i\), computed for each voxel over studies. The \(\tau_i\) capture information ignored by the consensus approach, and are useful for quantifying the degree of agreement. Ideally, the \(\tau_i\)’s should be near zero, reflecting perfect agreement. They have units of the \(Y_{ik}\), i.e. T scores, so a \(\tau_i\) of 1 indicates that the inter-study variation is just as great as the expected population sampling variability (if we could repeatly sample new cohorots of 108 subjects) of the T scores.

Unfortunately, the usual estimate of the inter-study sample variance \(S_i^2\) is biased due to the correlation \(\mathbb{Q}\) between the studies: \[\begin{eqnarray} \mathrm{E}(S_i^2) &=& \mathrm{E}\left(\frac{1}{N-1}\sum_{k=1}^N (Y_{ik} - \bar{Y}_i)^2 \right)\\ &=& \frac{1}{N-1}\mathrm{E}\left(Y_i^\top\mathbf{R}Y_i \right)\\ &=& \tau^2_i\mathrm{tr}(\mathbf{R}\mathbf{Q})\\ &=& \tau^2_i\left(1 - \frac{1}{N(N-1)}\sum_{k\neq k'}((\mathbf{Q}))_{kk'}\right)\\ &\neq& \tau_i^2 \end{eqnarray}\] where \(\mathbf{R} = \mathbf{I}-\mathbf{1}\mathbf{1}^\top/N\) is the centering matrix, and this calcuation reflects an assumption that the correlation \(\mathbf{Q}\) is the same for all voxels. The next-to-last expression shows that the bias is a function of the average of all \(N\times(N-1)\) possible inter-study correlations; if the the average correlation is positive, a negative bias will result (naive sample variance is too small); and if the correlation is high, approaching 0.9, this will be a considerable bias (that does not diminish with \(N\)).

Instead, we use this estimator to account for the interstudy correlation and provides unbiased estimates of the inter-study variance \[\hat\tau^2_i= \frac{1}{ \mbox{tr}(\mathbf{R}\mathbf{Q})}Y_i^\top \mathbf{R} Y_i,\] where, relative to the usual sample variance, \(N-1\) is replaced by \(\mathrm{tr}(\mathbf{R}\mathbf{Q})\).

For reference, the effective degrees-of-freedom for this variance estimate is \[\nu = \mbox{tr}(\mathbf{R}\mathbf{Q})^2 / \mbox{tr}(\mathbf{R}\mathbf{Q}\mathbf{R}\mathbf{Q}).\] The behavior under compound symmetry of \(\mathbf{Q}\) is noteable: While \(\mbox{tr}(\mathbf{R}\mathbf{Q})\) – the source of the bias in \(S_i^2\) – shrinks from \(N-1\) as \(\rho\) grows from 0, the degrees of freedom is constant, \(\nu=N-1\) for any \(\rho\).