PARAFAC2 and its block term decomposition analog for blind fMRI source unmixing

Tensor-based analysis of brain imaging data, in particular functional Magnetic Resonance Imaging (fMRI), has proved to be quite effective in exploiting their inherently multidimensional nature. It commonly relies on a trilinear model generating the analyzed data. This assumption, however, may prove to be quite strict in practice; for example, due to the natural intra-subject and inter-subject variability of the Haemodynamic Response Function (HRF). This paper investigates the possible gains from the adoption of a less strict trilinear model, such as PARAFAC2, which allows a more flexible representation of the fMRI data in the temporal domain. In this context, and inspired by a recently reported successful application of the Block Term Decomposition (BTD) model to a 4-way tensorization of the fMRI signal, a PARAFAC2-like extension of BTD (called here BTD2) is proposed. Simulation results are presented, that reveal the pros and cons of these tensorial methods, demonstrating BTD2's enhanced robustness to noise.


I. INTRODUCTION
Functional Magnetic Resonance Imaging (fMRI) is a noninvasive brain imaging technique, which studies brain activity, by measuring fluctuations of the blood-oxygen-level dependent (BOLD) signal [1]. BOLD fluctuation usually occurs between 3 to 10 seconds after the stimulus, and this effect is modeled by the so called haemodynamic response function (HRF). The localization of the activated brain areas is a challenging Blind Source Separation (BSS) task [2], in which the sources consist of a combination of spatial maps (areas activated) and time courses (patterns of activation). FMRI data involve multiple modes, such as trial and subject, in addition to the intrinsic modes of time and space [3]. Multivariate bi-linear (i.e., matrix-based) methods, based on the concatenation of different modes, have been, up to recently, the state of the art in fMRI BSS. However, by definition such methods fall short in exploiting the multi-way nature of the data. Multilinear (tensor) models can be used instead, which, in general, a) produce unique representations [4], b) improve the ability of extracting spatiotemporal modes of interest [3,5,6], and c) facilitate neurophysiologically meaningful interpretations [3].
Two tensorial methods are commonly used for analyzing multi-subject fMRI data, namely Canonical Polyadic Decomposition (CPD) [3] and Tensor Probabilistic Independent Component Analysis (TPICA) [7]. Both assume that the data have a trilinear form; in other words, the underlying signal sources (both spatial maps and time courses) are the same for the different subjects up to scaling. Of course, this presupposes that physiological artifacts, which are not likely to satisfy this assumption, have been removed prior to the analysis, and, also, that different subjects have the same HRF. The latter assumption of a global HRF function is not valid, since intrasubject and inter-subject variability is known to exist [1,8]. Hence, more flexible models, accommodating such variations, need to be considered. PARAFAC2 [9] is a model appropriate for three-mode data that do not admit a perfect trilinear representation and allows one of the modes to vary. It has been widely used in chemometrics [10] and has been recently adopted in fused EEG-fMRI [11] and proven effective in the analysis of functional connectivity of resting state fMRI [12]. To the best of the authors' knowledge, it has not been used in the context of BSS for task-related fMRI.
In [13], an alternative unfolding of the spatial domain of the fMRI data has been used. Instead of unfolding every 3-D image into a vector, its mode-1 matricization (mode-2 or mode-3 can also be used with similar results) is employed. Thus, instead of having a single matrix per subject, the proposed unfolding forms a 3-D tensor (Fig. 1). In multisubject cases, instead of a 3-D tensor, a 4-D tensor is formed. 1 This higher-order unfolding in the spatial domain, combined with Block Term Decomposition (BTD), leads to increased robustness to the presence of noise, compared to CPD and TPICA. In this paper, such a tensor decomposition method, BTD2, is proposed, which by adopting a similar rationale as PARAFAC2, bypasses the need for strict trilinearity and allows for a decomposition in low multinear rank (instead of rank-1) terms. Through extensive simulation results 2 , it will be demonstrated that the use of non strictly trilinear models, such as PARAFAC2, can improve the performance of BSS in fMRI and, furthermore, that the proposed BTD2 method can improve the accuracy and robustness of the decomposition even in challenging very noisy scenarios.

A. Notation
Vectors, matrices and higher-order tensors are denoted by bold lower-case, upper-case and calligraphic upper-case letters, respectively. For a matrix A, A T and A † denote its transpose and pseudo-inverse, respectively. An entry of a vector a, a matrix A, or a tensor A is denoted by a i , a i,j or a i,j,k , respectively. I m is the mth-order identity matrix and 1 m Fig. 1: Single-subject brain images unfolded in matrices and stacked in tensors. 1 A 5-D tensor could also be considered, albeit significant complexity increase and only small gains in performance; for more details, see [13]. 2 Comparisons based on real data are far more difficult due to the lack of ground truth in the fMRI problem. Such results will be included in a future, more extensive version of this paper. denotes the m × 1 vector of all ones. The symbols ⊗ and * denote the Kronecker and the Hadamard (elementwise) products, respectively. The column-wise Khatri-Rao product of two matrices, A ∈ R I×R and B ∈ R J×R , is denoted by , with a i , b i being the ith columns of A, B, respectively. The outer product is denoted by •. For an N th-order tensor, A ∈ R I1×I2×···×I N , with N > 2, A (n) is its mode-n unfolded (matricized) version, which results from mapping the tensor element with indices (i 1 , i 2 , . . . , i N ) to a matrix element (i n , j),

A. Canonical Polyadic Decomposition (CPD)
CPD (or PARAFAC) [14] approximates a tensor of multisubject fMRI data, T ∈ R I1×I2×I3 , by a sum of R rank-1 tensors, where E stands for the modeling error tensor (I 1 = I x I y I z in Fig. 1). Equivalently, and for the kth slice of T : where A = [a 1 , a 2 , . . . , a R ] is a matrix that contains the weights of the R components of the I 1 voxels (spatial maps) and B, C are similarly defined and contain the time courses and the subject activation levels, respectively. D k is the diagonal matrix formed from the elements of the kth row of C. The main advantage of the CPD, besides its simplicity, is the fact that it is unique (up to permutation and scaling) under mild conditions [15]. In fact, it was demonstrated [5,6,13] that CPD with fMRI data is robust to overlaps (spatial and/or temporal). On the other hand, the result of CPD is largely dependent on the correct estimation of the tensor rank, R [16,17].

B. Tensor Probabilistic Independent Component Analysis (TPICA)
Independent Component Analysis (ICA) has shown good performance in the characterization of fMRI data [18]. TPICA, as proposed in [7], is essentially a hybrid of the Probabilistic ICA (PICA) [19] method and the CPD method for multisubject cases. Given a tensor of multi-subject fMRI data, T , TPICA factorizes it as: where the rows of A are assumed to be a sample of independent, non-Gaussian [19] random variables and M = C B is a Khatri-Rao structured mixing matrix. TPICA computes the decomposition of the tensor T in two steps: an ICA step, which estimates M and A, and a Khatri-Rao factorization of M (using Singular Value Decomposition (SVD)) to determine B and C. 3 TPICA is more robust than CPD to rank estimation errors but it exhibits inferior performance in the presence of overlap in the sources and/or strong noise [5,6]. 3 The non-iterative version of TPICA has been used, since the iterative algorithm was shown to be flawed [6].

C. Block Term Decomposition (BTD)
BTD [20,21] is a generalization of CPD, which involves terms of rank higher than one. In particular, the rank-(L r , L r , 1) BTD of a tensor Z ∈ R Ix×Iyz×I2 is given by where the matrix A r = X r Y T r ∈ R Ix×Iyz has rank L r . Such a tensor decomposition model has been recently applied in a 4-way representation of fMRI data [13], leading to decomposition results robust to the presence of noise.

D. PARAFAC2
PARAFAC2 [9] differs from CPD in that strict trilinearity is no longer a requirement. CPD applies the same factors across all the different modes, whereas PARAFAC2 relaxes this constraint and allows variation across one mode (in terms of the values and/or the size of the corresponding factor matrix). For this reason, PARAFAC2 is not a tensor decomposition model in the strict sense as it can represent both regular tensors, with weaker constraints than CPD, as well as irregular tensors (collections of matrices of different dimensions) with size variations along one of the modes (Fig. 2). It can be written in terms of the (here frontal) slices of the tensor T as: with B k being different for different k's. This type of decomposition is clearly non unique [9]. Thus, in order to allow for uniqueness, it has been proposed to add the constraint that the cross products B T k B k are constant over k. This has been shown [22] to be equivalent with B k = P k H, where the R×R matrix H is the same for all slices, while the variability is represented by the columnwise orthonormal I 2 × R matrix P k . Under this constraint, one has to fit the equivalent model As shown in [22], P k can be computed as P k = V k U T k , where U k and V k are the left and right singular matrices of HD k A T T T k . As can be seen from Eq. (7), the problem of fitting PARAFAC2 has been transformed into that of fitting a CPD model with transformed data.
Applications of PARAFAC2 in fMRI analysis include [11] and [12]. The former study is concerned with joint fMRI-EEG analysis in single subject cases, with the different modes being time courses, spatial maps per slice, and slices, and allowing the mode that represents the spatial maps to vary over slices. In [12], the modes considered are time, subjects and space, as in this paper. In that work, PARAFAC2 was evaluated along with other covariance-based models with respect to (w.r.t.) its effectiveness in capturing the brain's functional connectivity. III. BLOCK TERM DECOMPOSITION 2 (BTD2) Phan et al. [23,24] have shown that unfolding noisy highorder tensors to lower order counterparts generally results in loss of accuracy in the respective decomposition. The extent of this loss in accuracy depends on the degree of collinearity of the columns in the unfolded mode. Furthermore, as pointed out in [25], the ability of multiway fits to make more robust predictions, compared to their two-way counterparts, seems to grow with the noise level. The fMRI signal is known to be highly contaminated by different types of noise (both due to hardware and physiological reasons). The use of higher-order tensors could, therefore, improve the BSS performance, both in terms of accuracy and robustness w.r.t. noise. Motivated from these findings, it was demonstrated in [13] that a higherorder tensorization of the data, which takes the 3-D nature of the brain volume into account to form a 4th-order tensor (I 1 = I x I yz ), combined with the use of rank-(L,L,1,1) BTD, is consideraby more robust to noise, compared to the CPD and TPICA-based decompositions. In the light of these results and since PARAFAC2 is a more suitable (than CPD) model for realistic scenarios, involving different time courses per subject (inter-subject HRF variability), we propose and evaluate here a BTD analog, called BTD2. The rank-(L,L,1,1) extension of Eq. (5) for the 4th-order tensor T ∈ R Ix×Iyz×I2×I3 is expressed as [13]: where the matrix A r = X r Y T r ∈ R Ix×Iyz of rank L contains the factors of the spatial (voxel) activity of the R sources and the matrices B and C = [c 1 , c 2 , . . . , c R ] correspond to the associated time courses and subjects, respectively.Using the (mode-3) unfolding of the T k = T (:, :, :, k) ∈ R Ix×Iyz×I2 tensors (k = 1, 2, · · · , I 3 ) to the matrices T k ∈ R I2×IxIyz yields the BTD2 decomposition: The matrices S = blockdiag(1 T L1 , 1 T L2 , . . . , 1 T L R ) and D k = blockdiag(c k1 I L1 , . . . , c kR I L R ) appear in formulating BTD as CPD, similarly to [26]. Extending the PARAFAC2 direct fit algorithm from [22] to the above 4-way model and appropriately adapting the ALS iterations from CPD to BTD as in [26] results in Algorithm 1. Concerning uniqueness of PARAFAC2, concrete results have only been reported for the special cases of rank R = 2 or 3 [27,28]. For the general case, a sufficient (but not necessary) condition was proved in [29]. All of these conditions can be extended to BTD2 (details are omitted due to space limitations). For instance, the condition in [29], which is based on the maximum number of unique combinations of four diagonal elements (with possible repetition) of D k , can be generalized as I 3 ≥R(R + 1)(R + 2)(R + 3)/24, with R = R r=1 L r . When L r is the same for all r, as it is the case here,R = LR.

IV. SIMULATION RESULTS
Simulated datasets similar to those in [5,7] have been employed in our comparative study. The signal consists of artificial voxel activation maps (of three different slices), time patterns and activation strengths for three subjects. White random Gaussian noise is added, with the noise mean and variance being estimated from real resting state fMRI (for Algorithm 1: BTD2 algorithm for a 4th-order tensor.
Input : A 4th-order tensor T ∈ R Ix×Iyz ×I 2 ×I 3 , R and L Output: Spatial factors X = [X1, X2, . . . , XR], with Xi ∈ R Ix×L and Y i ∈ R Iyz ×L , temporal factor B ∈ R I 2 ×R and subject factor C ∈ R I 3 ×R . 1 Calculate initial values of X, Y , H and C with SVD or using multi-initialization technique; until stopping criterion has been met; 21 until stopping criterion has been met; 22  details, see [7]). The voxel-wise noise mean and variance are the same for each subject. Beckmann and Smith [7] consider five different artificial fMRI datasets, named A-E, which differ only in their signal part and have no spatial overlap, while Stegeman [5] added three more fMRI datasets, F-H, with high percentage of spatial overlap between the sources. In dataset G, the first two spatial maps are a combination of spatial maps 1 and 2 of dataset A with overlap more than 50%. The time courses and the spatial maps consist of three different spatiotemporal processes, present in every subject with different power. In this paper, datasets A, G (lowest and highest spatial overlap) (Fig. 3) and C will be used. Dataset C has the same spatial maps as A, but with different convolution parameters for the generation of the time courses. In datasets A and G, a canonical HRF is assumed for all subjects, while in dataset C, the HRFs of every subject differ in mean lag and standard deviation (stdev = 3, 3.5, and 4 seconds, mean lag of 4, 5, and 6 seconds). This induces small differences in the temporal signal per subject. Using the same HRF parameters and spatial maps of dataset G, an extra dataset I has been generated (different time courses per subject with high spatial overlap). Furthermore, the mean lag of datasets C and I has been increased to test the performance of the algorithms in such conditions.
The performance evaluation is based on both visual inspection and Pearson correlation values. We computed the correlation between the time courses obtained from the different methods and the actual ones, as well as the correlation between the estimated spatial maps and the real "averaged" spatial maps computed via the Ordinary Least Squares (OLS) regression (similarly to [5,7]). In Tables I-IV, the mean Pearson correlation coefficients of 20 runs are summarized. The initialization for CPD, PARAFAC2 and BTD is performed with multi-initialization (multiple runs with less iterations) in order to avoid local minima. The standard deviations of the mean correlation coefficients were similarly low in all methods (slightly higher in datasets A and C where the power of the signal of activation is lower).
Even in datasets where the assumption of trilinearity is valid (datasets A and G with the same time course per subject) the result of PARAFAC2 is almost the same as that of CPD, with slightly higher cross-talk (cf. Fig. 4 and Tables I-II). In datasets C and I, where trilinearity is not satisfied, CPD completely fails to correctly extract three different sources, even if we consider an equivalent trilinear model of higher rank (R = 9). TPICA manages to handle the non-overlapped case of dataset C (the result is slightly better than that of PARAFAC2 because there is less crosstalk between wrong maps) but in cases of high overlap (Datasets G-I) it fails  dramatically to separate the spatially overlapped sources. If we increase the lag introduced to HRF, the result of TPICA is getting worse, while PARAFAC2 remains stable (as the cross product of time courses is stable) and gets better than TPICA even for dataset C.
Having verified the effectiveness of PARAFAC2 in these scenarios, the focus is now moved to the proposed BTD2 method using noise of varying strength. The Signal to Noise Ratio (SNR) is defined as the Frobenius norm of the signal divided by the Frobenius norm of the noise [5]. The SNR values are different in datasets C and I, due to the fact that dataset I has more areas activated and hence stronger signal. It is readily seen from Tables III-IV that BTD2 is more robust than PARAFAC2, even in the noisier cases. The result of PARAFAC2 deteriorates significantly for SNR values lower than 0.05 for dataset C and 0.10 for dataset I, while the result of TPICA deteriorates for even higher values of SNR (TPICA results are not included in Tables III-IV due to space  limitations). We can also see that the component which is less influenced by noise is component no. 3, which is quite reasonable since it has the minimum spatial and temporal overlap with the other sources. The result of BTD2 is fairly insensitive to overestimation of L (its value was set equal to 32). On the other hand, BTD2 is computationally more complex since more factors need to be computed.  V. CONCLUSIONS In this paper, a new tensor decomposition method, called BTD2, was proposed, which is based on BTD while allowing variation across one mode. It was demonstrated in simulated scenarios that non trilinear tensor decomposition methods, such as PARAFAC2 and BTD2, are expected to be more suitable for BSS in fMRI due to the variability of the HRF across subjects and they result in improved decomposition compared to strictly trilinear methods like CPD and TPICA. Once more, it has been seen that TPICA has difficulties in cases of overlap while it remains more robust w.r.t. rank estimation than CPD. The proposed tensor decomposition, BTD2, combined with a higher-order unfolding, exhibits significant robustness to noise compared to PARAFAC2, albeit at the cost of a higher computational complexity.