Combining the Benefits of CCA and SVMs for SSVEP-based BCIs in Real-world Conditions

In this paper we propose a novel method for SSVEP classification that combines the benefits of the inherently multi-channel CCA, the state-of-the-art method for detecting SSVEPs, with the robust SVMs, one of the most popular machine learning algorithms. The employment of SVMs, except for the benefit of robustness, provides us also with a confidence score allowing to dynamically trade-off the trial length with the accuracy of the classifier, and vice versa. By balancing this trade-off we are able to offer personalized self-paced BCIs that maximize the ITR of the system. Furthermore, we propose to perturb the template frequencies of CCA so as to accommodate with real world BCI applications requirements, where the environmental conditions may not be ideal compared to existing methods that rely on the assumption of soundproof and distraction-free environments.


INTRODUCTION
A Steady State Visual Evoked Potential (SSVEP)-based BCI enables the user to select among several commands that depend on the application, e.g. selecting one "box"/command out of many. Each command is associated with a repetitive visual stimulus that has distinctive properties (e.g., frequency). The stimuli are simultaneously Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. presented to the users who select a command by focusing their attention on the corresponding stimulus. When the users focus their attention on the stimulus, a SSVEP signal is produced that can be observed in the oscillatory components of their EEG signal, especially in the signals generated from the primary visual cortex. In these components we can observe the frequency of the stimulus, as well as its harmonics.
Canonical Correlation Analysis (CCA) is the state-of-the-art method that is used for translating the users' EEG signals into the corresponding frequency of the flickering stimuli. Intuitively, CCA correlates the users' EEG signal with several artificially generated sine-cosine signals that correspond to the flickering frequencies and their harmonics, and selects the one with maximum correlation score ("max" classifier). On the other hand, Support Vector Machines (SVMs) is a popular machine learning algorithm, that is known for its robustness and generalization ability as well as its capability in learning from high-dimensional data. SVMs coupled with spectral features (i.e. the transformation of the signal in the frequency domain) has also been used in detecting SSVEPs [12]. However, while SVMs could easily be applied to a single channel, there is no straight-forward way to use the signal from multiple channels. Possible solutions for this include either early fusion of the EEG signals (e.g. concatenation of the features) or late fusion techniques (e.g. to produce a model for each channel and fuse their prediction scores using averaging).
In this paper, we propose a novel algorithm that aims to bring together the best from both CCA and SVMs. More specifically, we propose to replace the "max" classifier that is typically used by CCA with an SVM-based classifier that operates on the correlation weights produced by CCA. In this way, the proposed method benefits from the inherent multi-channel nature of CCA as well as the robustness and generalization ability of SVMs. Furthermore, this allows a number of modifications to be incorporated in the algorithm so as to accommodate for personalized self-paced BCI applications and the adaptation of the methodology in real world conditions. More specifically, besides the artificially generated template signals Session: Emerging Technologies in Multimedia and Health MMHealth'17, October 23, 2017, Mountain View, CA, USA for CCA, the personal signals of each user (also known as Individual Templates (ITs)) can be used additionally to enhance the SVM model with subject-specific features. The incorporation of ITs have shown to improve significantly the performance of SSVEP detection [11]. In addition, leveraging the fact that SVM-based classifiers produce a score that indicates its confidence for the label of the testing trial, we propose to set a minimum threshold of confidence that can be used as an early stopping criterion (i.e. the user looks at the stimuli for the time required by the classifier to reach the minimum threshold). This is enabled by the fact that the confidence scores produced by SVMs tend to increase proportionally to the accuracy of the classifier and the length of the trial, which is not the case for the correlation weights of the CCA-based classifiers. Thus, in the case of the SVM classifier, after the confidence scores exceed the minimum threshold we can be certain that the classification output is accurate. Eventually, each user needs to focus on the stimuli only for the required time to reach the threshold, which results in user-dependent varying trial lengths and in turn in higher Information Transfer Rate (ITR). Finally, most of the existing datasets and methodologies on SSVEP-based BCIs rely on one basic assumption; ideal environmental conditions (e.g. no other stimuli, visual or auditory) and high-end lab computers, dedicated for the BCI experiment. However, this is not the case in real world applications where the environmental conditions are not optimal and the home-use computers may run a number of additional applications simultaneously, which may cause slightly different frequencies when the computer gets overloaded. In order to overcome the aforementioned problem, in this paper, we propose to perturb the frequencies of CCA's template signals, which makes the proposed approach robust to external noise and hardware failures due to system overload in real-world BCI applications.

RELATED WORK
The study of SSVEP-based BCIs has attracted a lot of attention in what refers to the use of algorithms and methods for maximizing the classification accuracy and improving the information transfer rate. The novelties that have been introduced in the literature can be grouped in two major categories. The PSD-based (Power Spectral Density) methods and the CCA-based methods.
In the first category, the methods aim at the spectral analysis of the EEG signals based on the fact that EEG signals originating from a SSVEP experiment are expected to synchronize at the stimulating frequency when the steady state is reached. In a SSVEP-based BCI, the notion of frequency plays a central role, which is typically used to generate characteristic features in schemes that rely on classification. Several spectral analysis methods have been used to extract the frequency of the EEG signals, such as the periodogram approach [4,10,14,15], the more advanced PWelch algorithm [3] and the spectrogram characterizing the signals in the time -frequency domain [3]. Another characteristic related to the frequency, and depending on the stimulus design, is the phase, which has been also used in SSVEP-based BCIs [6]. After the signals are described in the frequency domain, the next step is to decide which stimulating frequency is the source of the resulting EEG signal. At this decision step, multiple supervised classification schemes, originally proposed by the machine learning community, have been tested (e.g. SVMs, Linear Discriminant Analysis (LDA) and Extreme Learning Machines (ELM)). In the literature we can find several studies providing extensive experimental comparisons between the various feature extraction and classification schemes [3,9,12,13].
In the second category, based on the fact that during SSVEPs the brain signal synchronizes at the stimulating frequency, the CCA-based methods aim at identifying this frequency by correlating the user's signal to template signals representing the different frequencies. The original method where all the next relied on (i.e. the standard CCA [8]), generated artificial sine-cosine signals at the stimulating frequencies as reference signals, and selected the frequency that provided the maximum correlation coefficients with the user's signal. Extending the standard CCA method, in [2,11,16] calibration data from the user were utilized so as to generate subjectspecific reference signals. More specifically, the authors of [16] proposed to optimize the sine-cosine reference signals by maximizing their correlation with the calibration data. In a similar vein, the authors of [2] proposed to use the average signal of the user (i.e. the average ITs) from the calibration data as a reference signal instead of the artificially generated sine-cosine signals. Extending this idea, the authors of [11] proposed to combine the ITs from [2] with the artificial templates from standard CCA, by maximizing the correlation between the correlation weights of the user's signal, the best performing IT and the artificial templates.
In concluding, CCA-based methods so far have only considered the optimization of the reference signals, showing that the combination of ITs with artificial sine-cosine signals is the most beneficial in terms of SSVEP detection performance. On the other hand, in this work, we propose to optimize CCA by replacing the typical "max" classifier that selects the frequency with the maximum correlation coefficient with the more robust SVM classification scheme. This, not only improves the classification accuracy, but also facilitates a number of modifications that are necessary for personalized, self-paced and real-world SSVEP-based BCI applications (i.e. early stopping and frequency perturbation).

METHODOLOGY 3.1 Notation
For the presentation of the proposed method we adopt the following notation. Let us assume that the SSVEP BCI is designed to select one box among N f boxes flickering at their corresponding frequencies where N c is the number of channels and N t r = f s × t is the number of the samples (f s here is the sampling rate (Hz) and t is the trial length (sec)). For the proposed as well as the existing methods in this paper, we utilized a leave-one-session-out cross validation scenario, similar to [11], where in each fold, the trials of the s − 1 sessions were used as calibration data (EEG cal ) and the trials of the s t h session for testing (EEG t est ).

Canonical Correlation Analysis
Canonical Correlation Analysis (CCA) is a multivariable statistical method where the goal is to find the underlying correlations between two sets of data. The basic assumption of this approach is that these two sets of data is only a different view (or representation) of the same original (hidden) data. More specifically, a linear projection is computed for each representation such that they are maximally correlated in the dimensionality-reduced (hidden) space. Let us assume that the two different views of the hidden data can be represented by two matrices X ∈ ℜ K x P and Y ∈ ℜ H x P . Formally, the CCA approach seeks to find two vectors w ∈ ℜ K and v ∈ ℜ H that maximize the linear correlation between the projections w T X and v T Y. This is achieved by solving the following optimization problem: In order to use the CCA approach for SSVEP analysis [8], the two views of the same hidden data need to be defined (i.e. the matrices X and Y). The first view (i.e. the matrix X) contains the filtered EEG time series from one or more channels EEG s i , while the second view of the data (i.e. the matrix Y) contains the reference (template) signals. In SSVEP analysis, the reference signals are artificially constructed sine-cosine signals, corresponding to the stimulating frequency f n : For each stimulating frequency, the matrix Y usually contains reference signals for the frequency itself as well as its harmonics, 2f n , 3f n , . . . , N h f n , where N h is the number of harmonics: In our study the number of harmonics was set to 3. To recognize the frequency of the SSVEPs, CCA calculates the canonical correlation weights ρ n between the current EEG time series and the reference signals at each stimulus frequency. In the end, the stimulation frequency of the reference signals with the maximal correlation to the analyzed EEG signal, is selected as the frequency of SSVEPs (referred to as "max" classifier so far). The above procedure is applied for each trial.

Individual Templates
Besides the reference signals of Eq. 3, which are artificially generated, the signals generated by the user during a calibration phase EEG cal can be used as individual templates (ITs) characterizing the response of the user's brain to the stimuli f n : The authors of [2] proposed to replace the artificially generated templates with the average of the individual templates that were obtained during the calibration phase (i.e. Y = 1 s−1 s−1 j=1 ITs ( f n , j)), which increased significantly the performance of the system.

Support Vector Machines
One of the most popular classification algorithms is the SVMs, which aims to find the optimal hyper-plane that separates two classes (e.g. the positive class from the negative class) by maximizing the margin between them. This hyperplane, in its basic linear form, is represented by its normal vector w and a bias parameter b. These two terms are the parameters that are learnt during the training phase. Assuming that the data is linearly separable, there exist multiple hyper-planes that solve the classification problem. SVMs choose the one that maximizes the margin, assuming that this will generalize better to new unseen data.

Proposed method
The proposed method combines the benefits of CCA, ITs and SVMs, More specifically, we use the combination of the ITs (ITs ( f n , j)) and the artificially generated sine-cosine signals (R( f n )) as it was shown to perform best in [11] (all these signals will be mentioned as reference signals from now on). In more detail, let us define the set of reference signals for the stimulating frequency f n as RS f n (j), j = 1, . . . , N RS , where N RS is the total number of reference signals including the artificially generated sine-cosine signals for f n and its N h harmonics (Eq. 2), as well as the individual templates from EEG cal (Eq. 4): by applying CCA (Eq. 1) to EEG i and the set of reference signals RS f n : Finally, the features of the calibration set are given to SVMs to train a classification model.

Early Stopping
Timely decision is considered crucial when designing a BCI. In this direction, an early stopping criterion that exploits the confidence scores provided by SVMs to reach a decision more rapidly is proposed. More specifically, for each trial EEG i ∈ EEG t est , we consider several trial lengths t m ∈ {t min : t st ep : t max }, where we apply the classification model in order to obtain a confidence score. Essentially, for each trial length t m we apply our algorithm to EEG i (N c , f s × t m ) and we obtain the confidence score of SVMs c i (m) with label l i (m) for the trial. When the confidence score of the trial for one of the labels becomes higher than a minimum threshold th (th is selected through cross validation in the calibration set so as to maximize the ITR of the system for each user independently), the process is terminated and the trial is classified accordingly:

Frequency Perturbation
Aiming to facilitate real-world BCI applications, we propose to perturb the frequencies for the sine-cosine signals, so that our methodology is robust to external noise that may contaminate the signal, as well as lags due to system overload that may produce Session: Emerging Technologies in Multimedia and Health MMHealth'17, October 23, 2017, Mountain View, CA, USA slightly different frequencies in the Steady State. More specifically, we propose to perturb the frequencies of Eq. 3, and instead of only using the original frequency f n and its harmonics 2 * f n , 3 * f n , . . ., to additionally use the frequencies shifted by an offset f n ± o f f set per t , where o f f set per t is the perturbation offset in Hz. In order to accommodate for different levels of noise depending on the environment of the BCI application, we propose a pool of offsets (0.05, 0.1, 0.2 and 0.3) and depending on the level of noise we use one or more perturbation offsets. For example, if all 4 offsets are required for a specific setting, besides the frequency f n and its harmonics, there will be used also the following frequencies and their harmonics: f n ± 0.05, f n ± 0.1, f n ± 0.2 and f n ± 0.3.

EXPERIMENTAL SETUP 4.1 Datasets
The proposed algorithm is evaluated on two SSVEP datasets, one captured in ideal environmental conditions (SCCN [11]) and a more challenging one that was created in real world conditions (SSVEP-MULTI [5]). The first dataset, SCCN, has been generated using a 12-target visual stimuli with target frequencies (f 0 = 9.25Hz, ∆f = 0.5Hz) and phases (0 0 = 0, ∆0 = 0.5π ). Ten healthy subjects (9 males and 1 female, mean age: 28 years) with normal or correctedto-normal vision participated in this study and the EEG data were recorded with eight Ag/AgCl electrodes covering the occipital area using a BioSemi ActiveTwo EEG system (Biosemi, Inc.) 1 . The signals were amplified and digitized at a sampling rate of 2.048Hz, and all electrodes were with reference to the CMS electrode close to C z . For each subject, the experiment consisted of 15 blocks. In each block, subjects were asked to gaze at one of the visual stimuli indicated by the stimulus program in a random order for 4s, and complete 12 trials corresponding to all 12 targets.
For SSVEP-MULTI [5], the stimuli of the experiment were five violet boxes simultaneously flickering in 5 different frequencies (6.66, 7.50, 8.57, 10.00 and 12.00 Hz). The visual stimuli were projected on a 22" LCD monitor, with a refresh rate of 60 Hz and 1680x1080 pixel resolution. Also, high dimensional EEG data were recorded with the EGI 300 Geodesic EEG System (GES 300) [1], using a 256-channel HydroCel Geodesic Sensor Net (HCGSN) with sampling rate of 250 Hz and contact impedance not higher than 40K Ω. The synchronization of the stimulus with the recorded EEG signal was performed with the aid of the Stim Tracker model ST -100 (developed by Cedrus), and a light sensor attached to the monitor that added markers to the captured EEG signal. The experiment was undertaken by 11 subjects (8 male and 3 female), each of them performing 5 sessions of 25 trials. A few seconds prior to the stimulation period, one of the boxes was pointed by a yellow arrow identifying the box that the subjects had to focus on. The pointing arrow is shown during the trial, making it easier for the subjects to focus correctly for the entire length of the trial. In order to apply the leave-one-session-out protocol explained earlier, we used a subset of this dataset so that each class is equally represented (the classes are imbalanced in the whole dataset) and reformulated the trials into 20 sessions of selecting each one of the 5 boxes (100 trials in total). 1 http://www.biosemi.com/

Implementation details
A Matlab toolbox titled "ssvep-eeg-processing-toolbox" has been released in GitHub [7] in order to setup and perform the experiments described in this paper. For the experiments, as mentioned above, we have relied on the per subject leave-one-session-out (one trial from each class) cross validation protocol, that was used in [11]. For the evaluation, we use the classification accuracy and the ITR (the ITR for each subject was calculated independently and the ITR of the system was the average of the individual ITR values) to measure the BCI performance. Regarding the EEG channels, unless stated otherwise, for SCCN all 8 channels were used, while for SSVEP-MULTI the 8 optimal channels as found in [12] were used (i. e. 126, 138, 150, 139, 137, 149, 125 and 113).

Preprocessing and features
In this paper we test two categories of methods; a) the CCA-based methods (CCA, IT-CCA, IT-CCA-SVM, etc) and b) the PSD-based methods (SVM-single, SVM-multi, etc). For the CCA-based methods, the signals were first segmented into trials, then common average rereferencing was applied and finally the trials were band-passed filtered ( [6,80]Hz for SCCN and [5,48]Hz for SSVEP-MULTI, so that all 4 harmonics of the stimuli frequencies are included) with an IIR Butterworth filter. While the CCA-based methods were applied to the signals in the time domain, for the PSD-based methods the PSD given by PWelch was used as a feature vector for each trial and was given to an SVM classifier. Finally, for the SVMs, a linear kernel was used with the default value for C (C=1).

EXPERIMENTAL RESULTS
In this section our goal is to validate experimentally the benefits of the proposed approach in terms of performance, using accuracy and ITR as the evaluation measures. More specifically, initially, we compare the proposed method with baseline configurations in order to show the benefit of combining the inherent multi-channel nature of CCA, the robustness of SVMs and the personalization elements of ITs (Section 5.1). Then, we compare with existing stateof-the-art methods in Section 5.2, while we show the added value of our early stopping criterion presented in Section 3.6. For these experiments, the SCCN dataset was utilized. Finally, we show the benefit of frequency perturbation, presented in Section 3.7, for real world applications, where the environmental conditions cannot be ideal (Section 5.3). For this experiment both datasets were used, in order to show the impact of perturbation in the ideal scenario and the real-world scenario. In detail, we compare with the following methods: IT-CCA-SVM This is the proposed method that combines the standard CCA and the ITs with SVMs. IT-CCA-SVM-ES This is the proposed method also incorporating the early stopping criterion. IT-CCA-SVM-pert This is the proposed method also incorporating perturbation of the sine-cosine signals. CCA [8] The standard CCA, which is the most popular method for SSVEP detection, using the first 4 harmonics of the stimuli frequencies in the form of artificially generated sine-cosine signals.
SVM-single [12] For this method, PWelch was used in one channel to extract the spectral information of the signal and the extracted features were fed to a linear SVM classifier. SVM-multi-early For this method, the spectral features (PWelch) of multiple channels were extracted, then concatenated in one feature vector and fed to a linear SVM classifier. SVM-multi-late For this method, the spectral features (PWelch) of multiple channels were extracted, then a linear SVM classifier was trained from each channel and the confidence scores of all different classifiers were averaged to produce the final classification score. CCA-SVM This method combines CCA with SVMs, by feeding the correlation coefficients of the examined signal with sinecosine signals as features to a linear SVM classifier. IT-SVM This method feeds the correlation coefficients of the examined signal with the ITs as features to a linear SVM classifier. IT-CCA [2] This method averages the ITs to a single individual template signal and applies the standard CCA method by using this IT instead of the artificially generated sine-cosine signals. L1MCCA [16] The L1MCCA approach (L1 regularized multiway CCA) is a CCA based method and was proposed to improve CCA's detection performance by optimizing the reference signals through maximizing the correlation between a set of individual templates and the artificially generated sine-cosine signals of CCA. Comb-IT-CCA [11] This method combines the standard CCA and the IT-CCA approaches. The combination of the different reference signals is made in a late fusion manner, where the final "classification" score is the max correlation coefficient between the following correlation weight vectors: (1) between the test signal and the individual template, (2) between the test signal and sine-cosine reference signals, (3) between the individual template and sine-cosine reference signals. For this method, the individual template that maximizes the weight correlation value is selected as the reference signal corresponding to the target.

Comparing with baselines
In this section, our goal is to compare the proposed method (IT-CCA-SVM) with multiple baselines based on either CCA or SVMs and show the benefit of combining these two methods. More specifically, we compare IT-CCA-SVM with CCA, SVMs applied on a single channel (channel 7 provided the overall highest performance) (SVM-single) and multi-channel SVMs (with all 8 channels) using two popular methods to fuse the information from the various channels (SVM-multi-early and late). Furthermore, we compare with two baseline configurations of the proposed approach; CCA-SVM, which uses only sine-cosine signals and feeds the correlation coefficients to SVMs and IT-SVM which uses only the ITs as reference signals and again feeds the correlation coefficients to SVMs. The results are shown in Fig. 1 for different trial lengths (1 to 4 seconds) and with respect to both classification accuracy ( Figure 1a) and ITR (Figure 1b). We can see that the proposed method (IT-CCA-SVM) consistently outperforms all other methods for all trial lengths, providing significant performance boost. In more detail, we can see that the PWelch-based methods perform poorly compared to the CCA-based methods (SVM-x methods compared to CCA-x methods). Furthermore, transforming the PWelch method to multichannel through either early or late fusion does not provide better classification rates, showing that incorporating multi-channel information is not a straight-forward process (SVM-multi-early/late compared to SVM-single). This shows the benefit of using CCA, an inherently multi-channel method, to combine the information from the multiple channels. Moreover, an interesting observation is that CCA-SVM outperforms CCA, indicating that the generalization ability and the robustness of SVMs compared to the typical "max" classifier employed by CCA can provide significant performance benefit. In addition, comparing IT-SVM and CCA-SVM, we validate that, also when combined with SVMs, the individual templates provide significant boost in performance compared to the artificially generated sine-cosine signals, similar to what has been observed in the literature [2]. Finally, their combination (i.e. the proposed IT-CCA-SVM method), outperforms both variants, showing that both individual templates and sine-cosine signals contain information that is important for the SSVEP detection.

Comparing with SoA
Our objective in this Section is to compare the proposed method with existing ones that deliver state-of-the-art performance. More specifically, we compare IT-CCA-SVM and IT-CCA-SVM-ES with CCA, L1MCCA, IT-CCA and Comb-IT-CCA. The results are shown in Fig. 2 for different trial lengths (1 to 4 seconds) and with respect to both classification accuracy ( Figure 2a) and ITR ( Figure 2b). As we can see, IT-CCA-SVM consistently outperforms CCA, L1MCCA and IT-CCA, as expected, since IT-CCA-SVM combines information from both artificially generated sine-cosine signals (as in CCA) and individual signals (as in L1MCCA and IT-CCA). Comparing with Comb-IT-CCA, which also combines information from sine-cosine and individual signals, we can see that the proposed approach outperforms Comb-IT-CCA for larger trial lengths above 2 seconds, while Comb-IT-CCA provides better results for a trial length of one second. However, combining the proposed method with the early stopping criterion (IT-CCA-SVM-ES), we can reach a much higher performance (93.11%) at an average trial length of 0.99seconds, providing a significant boost in the ITR of the system. Furthermore, in order to validate our assumption that the early stopping criterion can benefit naturally from the output of a SVM classifier, we have also applied the criterion to the other state-ofthe-art methods (the threshold is selected through cross-validation in the training set). The results can be seen in Table 1. For every subject, we provide the accuracy of SSVEP detection, the average trial length for this accuracy and the ITR. Overall we can see that the proposed method outperforms in both accuracy and ITR all methods. In more detail, we can see that the application of the criterion to CCA provides for all subjects a long trial length in order to achieve 91.06% accuracy, which is due to the fact that CCA performs poorly in short trial lengths. On the other hand, L1MCCA, IT-CCA and Comb-IT-CCA opt for a lower accuracy (around 75%) at shorter trial lengths compared to CCA (2.07, 1.52 and 0.51 seconds respectively). For CCA and L1MCCA, the achieved ITR is lower than the best one achieved in a standard length trial (comparing with their respective ITR in Figure 2b). The ITR of IT-CCA-ES is slightly better than the best one achieved at the standard trial length of 1 second, while only Comb-IT-CCA-ES manages to increase its ITR, but this is due to the fact that Comb-IT-CCA performs well in short trial lengths and for almost all trials 0.5 second trials were used. Indeed, if one looks closer to the correlation scores that were provided by Comb-IT-CCA, it becomes clear that in most cases the higher correlation coefficient is given in the 0.5 second trials and then it slowly decreases. This is expected, since the produced complex correlation coefficients are not designed to provide a "confidence" measure for the label of a trial. On the other hand, this is the case for IT-CCA-SVM, since SVMs naturally provide a confidence score that can be leveraged successfully by ES, facilitating in this way self-paced BCI applications. As we can see from the results, it provides slightly longer trial lengths that are able to increase significantly the accuracy (e.g. S2, S5, S6, S8, S9 and S10), maximizing in this way the ITR of the system. This is more evident based on the performance of each user, as it produces longer trial lengths for poorly performing subjects (S1, S2, S10) and lower lengths for better performing ones, allowing the system to adapt to the personal characteristics of each user.

SSVEP in real world applications
In this section, our objective is to show the impact of incorporating frequency perturbation in BCI applications that are operated in realworld environmental conditions. Towards this goal, we compare the proposed method IT-CCA-SVM with its variant IT-CCA-SVM-pert, where frequencies have been perturbed in order to also consider neighboring frequencies. We have tested different perturbation offsets from the proposed pool and we show the optimal results for each dataset. More specifically, for SCCN, 1 perturbation offset (i.e. ±0.05Hz) provided the best results (Figure 3), while for SSVEP-MULTI the best results were achieved with 4 perturbation offsets (i.e. ±0.05Hz, ±0.1Hz, ±0.2Hz, ±0.3Hz) (Figure 4). This is in line with our expectations that BCI applications operating in noisier environments, as in the SSVEP-MULTI case, require more perturbation offsets (Section 3.7). Moreover, as we can see from Figure 3, perturbation provides no significant benefit for the SCCN database and even performs slightly worse in the short length trials (i.e. 1 second). On the other hand, it significantly improves the classification accuracy in the more challenging SSVEP-MULTI dataset for all trial lengths (from 3.5% absolute accuracy units for 1 second trials to 11% for 5 second trials (Figure 4a), providing also a 39% boost in ITR (from 18 to 25 (Figure 4b)).

CONCLUSIONS
In this paper, we presented a novel method for self-paced SSVEPbased BCI applications. The proposed method combines the benefits of the multi-channel CCA algorithm and the robustness of SVMs. Furthermore, the proposed method combined with the presented Early Stopping criterion facilitates self-paced and personalized BCIapplications based on a confidence threshold that allows for adaptive trial lengths to accommodate each user's personal needs. This is validated through our experiments, which show that the proposed adaptive threshold provides longer trial lengths for users that exhibit low performance, while reaches a decision sooner for well-performing users. Finally, the proposed method is extended with the utilization of additional perturbed frequencies, which is designed to provide robust performance for BCI applications in real-world environmental conditions. This is validated experimentally by showing the benefit of the extended method to datasets captured in real-world conditions, while no benefit was observed in datasets captured in ideal lab conditions.