A novel partially connected cooperative parallel PSO-SVM algorithm: Study based on sleep apnea detection

Sleep disorders are common in a general population. It effect one in 5 adults and has several short term and long term bad side effects on health. Sleep apnea (SA) is the most important and common component of sleep disorders. This paper presents an automatic approach for detecting apnea events by using few bio-singles that are related to breathe defect. This work uses only air flow, thoracic and abdominal respiratory movement as input. The proposed algorithm consists of three main parts which are signal segmentation, feature generation and classification. A new proposed segmentation method intelligently segments the input signals for further classification, then features are generated for each segment by wavelet packet coefficients and also original signals. In classification phase a unique parallel PSO-SVM algorithm is investigated. PSO used to tune SVM parameters, and also data reduction. Proposed parallel structure used to help PSO to search space more efficiently, also avoiding fast convergence and local optimal results that are common problem in similar parallel algorithms. Obtained results demonstrate that the proposed method is effective and robust in sleep apnea detection and statistical tests on the results shown superiority of it versus previous methods even with more input signals, and also versus single PSO-SVM. Using fewer signals means more comfortable to subject and also, reduction of cost during recording the data.


INTRODUCTION
Sleep disorders are important because they are common in a general population; a survey in 1987 [1] reported that at least one symptom of disturbed sleep was present in 41% of all subjects; and still sleep disorder is common now [2], for instance, Young reported that one daytime sleepiness in 5 adults in 2004 [3].
The sleep disorders have several short term and long term bad side effects[4]. Short-term effects lead to impaired attention and concentration, lowered life quality, increased rates of absenteeism with less productivity, and greater possibility of accidents at work, home or on the road. Longterm consequences of sleep deprivation include increased morbidity and mortality from more automobile accidents, coronary artery disease, heart failure, high blood pressure, obesity, type 2 diabetes mellitus, stroke and memory impairment as well as depression. Long-term consequences, however, are still open [5] for further academic research..
Sleep apnea (SA) is one of the most common and important component of sleep disorders. Unfortunately, because of person"s unawareness, sleep apnea may go undiagnosed for years [6,7]. Usually the SA case is often observed via patient"s spouse, a roommate, or a family member who has witnessed the apnea periods alternating with arousals and accompanied by loud snoring [8,9]. The patients -who have symptoms of SA -should be checked through an overnight sleep study, in sleep center. The diagnosing of SA is usually achieved through analysis of recorded signals by polysomnography, an integrated device comprising EEG, EMG, EOG, ECG and oxygen saturation [10]. The polysomnography also recording the airflow through the mouth and nose, thoracic and abdominal respiration measurement units [11], thoracic breathing movements and the position of the body during sleep. From overnight sleep studies apnea/hyponea index (AHI) can be found, the AHI will holds the sum of apneas, hypopneas, and respiratory arousals per hour during sleep after been standardized. AHI is used to assess the severity of apnea according to the Chicago criteria: AHI<5, normal; AHI =5-15, mild; AHI=15-30, moderate; and AHI >30, severe [12].
The polysomnography recording need to be reviewed in order to detect events of SA, the manual review by experts is high cost and time-consuming. Therefore, many efforts have been done to develop systems that analyse recording singles automatically [13][14][15]. For this reason several Artificial Intelligent algorithms are used in this area, like as fuzzy rule based system or genetic-SVM algorithms that proposed by authors previously [16].
In this paper, sleep apnea events detected by a PSO-SVM algorithm that implemented in a new parallel structure. Details of the detection process are explained in future sections. This paper is organized as follow, firstly, some preliminaries about algorithms and a basic concept about parallel PSO implementation has been introduced. In section III, details of the proposed approach comprised of signal segmentation, feature generation and classification by PSO-SVM is explained. Finally, in section IV, experimental results on real data are reviewed and the paper is conclude in section V

A. Support vector machines
The Support vector machines (SVM) pioneered by Vapnik [17] is known as an excellent tool for classification and regression problems. We merely outline its main ideas here.
Given a training set of data-label pairs ( ) where and * + , the generalized linear SVM finds an optimal separating hyperplane ( ) by solving the following optimization problem: Where is a penalty parameter on the training error, and is the non-negative slack variable. SVM finds the hyperplane that provides the minimum number of training errors (i.e., to keep the constraint violation as small as possible). This optimization model can be solved by introducing the Lagrange multipliers for its dual optimization model. After the optimal solution is obtained, the optimal hyperplane parameters and can be determined, and the indicator function (classifier) can be written as: The nonlinear SVM maps the training samples from the input space into a higher dimensional feature space via a mapping function . By performing such a mapping, the training samples could be linearly separated by applying the linear SVM formulation. The scalar product ( ) ( ) is calculated directly by computing the kernel function ( ) ( ) ( ) for given training data in an input space.
There are 3 common kernel functions in SVM: Here are kernel parameters.
By introducing the kernel function, the nonlinear SVM classifier has the following forms:

B. Particle swarm optimisation (PSO)
In 1995, Kennedy and Eberhart introduced particle swarm optimization (PSO) [18]. PSO is a population-based stochastic optimization technique based on the movement of swarms and inspired by social behavior of birds or fishes. In comparison with other metaheurestic algorithms like genetic algorithms (GAs), PSO have fewer complicated operations, fewer defining parameters, can be coded in just a few lines and it is highly dependent on stochastic processes. Because of simplicity and good performance of the PSO, this algorithm has received increasing attention in recent years [19,20].
For the -dimensional feature space including particles, ( ) and ( ) are the position and velocity of the th particle respectively, . The performance of is evaluated by its fitness function value. In each iteration, the particle updates own position and velocity by tracking its best solution and the global best solution discovered by all particles in the swarm.

Let
denote the best previous position encountered by the th particle, and denote the global best position so far. Having find and , the th particle updates its position and velocity according to formula (1) and (2).
Where and are random numbers in , -, is the iteration counter and positive constant and are personal and social learning factors. and are the current position and velocity of the th ( ) dimension in the th iteration of the th particle, respectively.
Later, Shi and Eberhart [21] introduced an inertia weight w, which control the impact of the previous velocity on the current velocity, by modifying Eq.(1)to Suitable inertia weight provides a balance between global and local exploration and exploitation, and on average results in less iterations required to find a sufficiently optimal solution. Previous studies [22] have shown that a time-dependent weight factor often outperforms a fixed factor. The most common functional form for this weight factor is linear, and changes with time step as follows: where is the maximum number of iterations and and are often selected to be 0.9 and 0.4, respectively [22].

The velocity
is restricted to the , range. This range determines the resolution of the search regions between the present and target position. If is too high, the particle may fly over the good solutions. If is too small, the particle may not explore sufficiently beyond local solutions and get trapped in a local optimum.
The constants and represent the weight of the stochastic acceleration terms that pull each particle toward and . Low values allow particles to roam far from target regions before being tugged back. On the other hand, high values result in abrupt movement toward, or past, the target regions. Hence, the acceleration constants and are often set to be 2.0 [23].

C. Parallel PSO
Recently, the availability of cheap and fast parallel hardware has encouraged researcher to possibility of implementing of parallel type of metaheuristic algorithms for large scale problems. Specially population based algorithms, like as genetic algorithms, are good candidate for parallelization [24]. PSO is also selected to implemented parallel in many studies recently [25,26].
Generally, parallel processing can be classified to two major classes as pipeline processing and data parallelism. Pipeline processing separates the problem into a cascade of tasks where each of the tasks is executed by an individual processor. Data are transmitted through each processor which executes a different part of the process on each of the data elements. Since the program is distributed over the processors in the pipeline and the data moves from one processor to the next, no processor can proceed until the previous processor has finished its task. Data parallelism is an alternative approach which involves distributing the data to be processed amongst all processors which then executes the same procedure on each subset of the data. [27].
Data parallelism used widely for implementing metaheurestic algorithms like PSO. The parallel PSO can be classified into three categories: (a) Global or Master-slave PSO: This model uses a single global population and the fitness evaluation is done on different processors. The nature of PSO is not changed because an algorithm still works with the whole population. In this model, the gain is only the speedup in the optimization process [28] as shown in figure 1. (b) Migration PSO or island model (Coarse-grained): The population is divided into a few large subpopulations. Each of these subpopulations is maintained by different processors. Then, according to some ""migration-strategy""(commonly a given number of iterations) individuals are exchanged [28]. Island PSO models also referred in literatures as multi PSO [29] as figure 2 shown. (c) Fine-grained PSO or cellular PSO: The population is separated into a large number of very small subpopulations, which are maintained by different processors. The subpopulation may be only a particle. This model is suitable for massively parallel architectures -machines consisting of a huge number of basic processors and connected with a specific high speed topology. The computer structure limits an interaction between individuals. As illustrated in Figure 3, each individual has 4 neighbors. Such neighborhood restriction delays the information exchange between non-neighbor processes, increasing the diversity in the search [28]. The use of local selection and reproduction rules leads to a continuous diffusion of individuals over the population. Therefore this model is also called diffusion mode [30].

III. APPROACH AND METHOD
This paper introduces new technique for Sleep apnea detection through using new segmentation method and novel parallel PSO-SVM algorithm. It is based on using just three signals, airflow, abdominal and thoracic movement signals as inputs. The methodology comprised of three main stages.
1-Signal segmentation: in this stage, data are segmented to create a Reasoning Units (RUs) that may be containing one or more SA case for further analysis.
2-Feature generation: this stage contains generating of several statistic features for each RU from wavelet packet coefficients and original signals.
3-Parallel PSO-SVM: in this stage, PSO algorithm used for selecting best training data and features subset interactively by SVM, as fitness evaluator, for final classification of RUs to sleep apnea or normal ones, this PSO-SVM applied in parallel by a new architecture to achieve better performance and avoiding of local optimal solutions.
Details of new algorithms proposed in each part, are as follows:

A. Signal Segmentation
In the first step, the input signals segmented to Reasoning Units (RUs) [31]. Each RU represents an interval which has more chance to contain sleep apnea event. The length of each segment selected as 30 seconds based on consideration of state of art in sleep disorders and especially sleep apnea literatures. The proposed signal segmentation algorithm in this study is based on [32] with some modification. The original algorithm [32] used only airflow for detecting sleep apnea events but our proposed modified version consider three signals as input to generate RUs. Pseudo code of the modified algorithm is as follow. - Pre processing: Each signal normalized by attention to its mean and variances. Slave n ( ) -Amplitude calculator: Input signals are 10 Hz, so for each signal, difference between local maximum and local minimum of each 10 consequence data, from each second, is considered as amplitude of that second. -Amplitude reviewer: Amplitude reviewer takes the classified amplitude vectors as input and reviews the feasible events, this task accomplishes as an event fusion (a and b steps) as follows: (a) As a complete respiratory cycle lasts at least 3 seconds, amplitude values located between feasible event values and separated no more than 3 seconds, are classified as a feasible event.
(b) The amplitude values that primary classified as feasible event will be classified as normal if its duration is less than 10 seconds.
-RU generation: First of all computed amplitude of the airflow signal is reviewed and for each amplitude value that labeled as feasible event, one RU with length equal to 30 seconds is created. After generating RUs related to airflow"s feasible events of abdominal and thoracic movement signals are reviewed. In this step, a new RU will be created between previous RUs if a feasible event found in abdominal or thoracic amplitudes and also if there is room for that RU between previous made RUs.
By completing this algorithm some not overlapped RUs will be generated. Each generated RU has chance to contain at least one SA event. In future steps these RUs will be classified and filtered to SA or normal by more comprehensive methods. It must be noticed that segmenting the signals to the not overlapped windows, without any smart process or actually by a blind segmentation can reduce the final accuracy. As the figure 4 describes, in 4.a whole sleep apnea event correctly located in one window but in 4.b the length of sleep apnea event in each window is less than 10 seconds so these windows can be wrongly classified as normal in future steps. This problem has been dealt with it in our propose signal segmentation method.

B. Features Extraction
After determining RUs, some statistical measures corresponding to signals relevant to each RU are generated as features. Features extraction plays an important role in recognition systems, in this paper features generated from wavelet coefficients and also from original signals. To find these features, 3 levels "Haar" wavelet packet applied on input signals. Then several statistical measures are computed by attention to the coefficients related to each RUs. These features represent the inputs of proposed PSO-SVM algorithm in the next step. Full list of proposed features are included in Table I. Exact mathematical definitions of these features are available in Appendix A.
Also for each signal, difference of local maximum and minimum of each 10 seconds is computed and then same features as Table 1 are computed for these differences during each RU.

C. Paralle PSO-SVM algorithm
After generating RUs and features related to each RU, more attractive classification method needed to separate the RUs to sleep apnea or normal ones. For this reason, a partially connected parallel PSO-SVM algorithm with a new architecture is proposed. This algorithm is used for features selection and also tuning the parameters of the SVM.
First of all, RUs of each subject (patient) separated randomly to three equal sets as: train, test and validation. We want to use learning method by using training and test data and then validate it by using validate set.
The learning algorithm, PSO-SVM, used for feature subset selection, but also during this work founded that selecting best Abdominal training data, from fixed set of training data, have important impact on overall performance as well as the selecting the best subset of features. This can be result of interferences between training data that might lead to low classification accuracy, so we actually filter training data. Training data reduction also, applied in previous studies [33].
It must be noted that the PSO-SVM tries to find a best training data and features subset for classification of the fixed test set during learning phase, and then final performance of the method obtained by classification of an unseen validation set by the selected training and features data. Besides, if the whole training data results a better classification, this algorithm automatically will use all the training data. Detail of the PSO-SVM algorithm is as follows.

Particle representation
As mentioned before, PSO algorithm is used in this paper for selecting best features subset and also best training data from the total training set. Also, size of selected features and selected training samples are determined by the proposed algorithm itself. Finally, parameters of SVM like as Type of SVM (C-SVM, nu-SVM and nu-SVM), kernel type, degree, gamma and cost are determined by the PSO.
Therefore, the particle is comprised of five parts, Size of the features subset, Size of the selected training data (integervalues), the features and training weights, which are real numbers between and , and finally, five SVM parameters. Figure 5 shows the representation of first four parts of a sample particle with dimension of " " where is total number of features and is total number of training data Figure 5, represented a particle corresponding to the sample problem with features and training data, except the SVM parameters part. By attention to the first two cells, start from the left side; two features and three training data must be selected from original feature and training data sets. For selecting this features and training data, by attention to the importance of features, which represented in cells numbers three till five. Features 2 and 3 are selected due to their importance, and similarly, training data with number 1,2, and 4 must be selected. It must be noted that, integer PSO is not used for the first two cells. Actually, regular PSO used in this work and then obtained result rounded to the closest integer number for the integer parts.

Initialization
and of PSO are set to 0.2. Position and velocity of each particle generated randomly, by normal distribution. For example, number of selected training data is a random numbers between, size of training data/2, and size of training data. And number of selected features is a random number between 3 and 20.
For the parallel structure, different settings are tested, like as 5 masters and 1 slave, 4 masters and 2 slaves and so on. Frequency of sending the best local particle sets to 5.

Particle evaluation
After determining features subset and training data subset corresponding to each particle, SVM used to classify testing data set by selected training data and features. Accuracy of this classification is used as the fitness of corresponding particle. Support vector machines used in this study because of the primary advantage that is ability to minimize both structural and empirical risk leading to better generalization for new data classification [33]. This property of SVM is so important especially when we using optimization approach with a classifier that can be led to over fitting problem.

Parallel structure
In this work with consideration to the big size of search space, single PSO hasn"t a good performance for this problem and lead to local optimum with low accuracy for classification. Therefore Parallel structure selected to perform better exploration in search space.
In this study a master-slave (multi swarms) structure with new cooperative strategy among swarms is introduced. In traditional master-slave model, swarms classified to one master and several slaves swarms [34]. But, in the proposed strategy, swarms classified to several masters and one (or more) slave. Master swarms have access to the best particle of others swarms but slaves ones have no access to others information"s. Sending the best local partition information among masters and from slave(s) to masters can be perform in each iteration or after specified number of iterations. Pseudo code of the proposed multi swarm PSO is as follows.

Begin
Select the number of master and slave swarms and size of particles for each sub-warm. Select one of the master swarms as center.
Initialize the position and velocity of each particle

Do in parallel until the maximum number of iterations has reached {
Evaluate the fitness value of each particle Find out the local best position in each sub-swarm If meet sending condition Sending the local best particle from each swarm to the center swarm.

End If
Updating global best particle in center swarm.
Sending the global best swarm to all of the master swarms.
Size of feature's subset.
Size of training data's subset.
Relative importance of each feature.
Relative importance of each training data. Figure 5. representation of one sample particle.
Calculate the new velocity of each particle in each sub-swarm Update the position of each particle in each swarm

End Do}
Return the best solution (the global best particle) of the algorithm End Figure 6 shows a sample of proposed parallel structure with 5 masters and 1 slave swarms. In this implementation, master 1 is selected as the center swarm, so all of the swarms send their local best particles to this swarm. After computing the global best swarm, it is send to all of the master swarm. So, in this structure slave swarm, provide information for others swarms, but not get profit from others information.
Indeed fast convergence is one of the disadvantages of PSO which heightened in parallel structure. The proposed model tries to expand exploration and exploitation ability of PSO, by integrating isolated swarms with linked swarms together. Slave swarm(s) in this model help to prevent premature convergence and local optimums.

IV. RESULTS AND DISCUSSION
Experimental data containing two sets of data; the first one is 6 overnight sleep recordings were provided by concord hospital in Sydney, and the second set consist of 12 samples from [31] were both two sets events annotated by experts.
In the first step RUs generated for each sample, Table 2 and 3 tabulated total number of sleep apnea events, number of generated RUs and percentage of coverage for each sample. It must be noted that we consider an event as coverage with a RU if middle point of that event is in the RU and also at least 10 seconds of the event happened in that RU. After generating of RUs , features corresponding to each RU are computed, by applying a 3 levels "Haar" wavelet packet, 295 features are generate as described before. These features used as input for the parallel PSO-SVM algorithm.

SA events
The parallel structure consists of 5 masters and one slave that implemented by MATLAB parallel toolbox. An SVM package, OSU-SVM used in this work. The automatic detector system showen a good performance, obtaining a Sensitivity and accuracy of 86% for first set of data that shows the proposed system can detect efficiently both apnea and normal RUs. This high performance also achieved for second set of data. Also result of table 5 can be compared with [31], which used same data but by using one more input as oxygen saturation and achieved sensitivity of 87% and specificity of 89%, in total. To determine if there is significant differences between the obtained results and result of [31] we performed two sample t-tests with the significance level chosen as α = 0.05. T-test shows higher mean in all of the indexes; sensitivity, specificity and accuracy, even by using one less input signal, oxygen saturation, than [31]. As Table 5 shows performance indexes for sample 4 is not provided, and it is because this sample only contain 4 sleep apnic events, so after dividing this sample to three sets randomly, there is no event in training data, so this algorithm fails to predict them. But, this is not important because, generally, this algorithm and algorithms like this applied to samples that have sleep apnea and contain enough apnic events.

SA events
The optimization in training phase does no lead to over fitting in this work. Actually, if considering all the 12 and 6 samples together this algorithm achieved sensitivity, specificity and accuracy as 87.19, 88.40 and 87.93 for validation sets, respectively. These indexes are as 91. 62, 92.56 and 92.20 for training phase, respectively, that are not so far from validation results.
The total processing time of this algorithm in parallel by the described structure is as 6421 seconds with accuracy of 87.93. We can consider the isolated slave swarm, as a sample of series implementation, which has process time equal to 6151 seconds and accuracy of 83.66; it can be shows that the proposed parallel structure helps us to improve the performance by acceptable extra time. For designing the parallel structure three different strategy as; S1: 1 slave and 5 Masters, S2: 2 slaves and 4 Masters and finally S3: 3 slaves and 3 masters are tested to find the best combination between masters and slaves swarms. Figure 6 shows accuracies for these different strategies for the first database with 6 samples. By attention to the accuracy for all of these six samples the first strategy selected which, also applied on second database. In this paper, a novel Parallel PSO-SVM algorithm is proposed to detect sleep apnea. It works based three input signals (airflow, abdominal and throat movements) to the proposed algorithm. This algorithm consists of three main parts; signal segmentation, feature generation, and classification. In each part new algorithm is developed by attention to the nature of sleep apnea events.
In the signal segmentation phase, signals are segmented to non overlapped time windows named RUs. Each RU may contain at least one Sleep apnea event. This segmentation by smart processing the signals helps to achieve better result in future generation phase versus blind segmentation. After segmentation, features are generated by wavelet packet coefficients and also original signals. Finally, a parallel PSO-SVM classified RUs as sleep apnea or normal. The unique structure of the parallel algorithms helps PSO to have both exploration and exploitation together, by use of connected and isolated swarms.
Experimental results on real data shows high accuracy of this algorithm. Also, by comparing with results of [31] the proposed approach performs advantageously. It must be considered that [31] used one more signal as input, Oxygen saturation, than this work. Using less one signal means more comfortable to subject during recording the data in sleeping time. Appendix A.
Mean, variance (VAR) and standard deviation (STD) are common and well known statistical tools, so other statistical features, in this study, are reviewed here. Kurtosis: The kurtosis of a distribution is a measure of how outlier-prone a distribution is, and defined as follow ( ) Geomean: Geomean is the geometric mean and computed as follow: [∏ ] ⁄ Skewness: Skewness is a measure of the asymmetry of the data around the sample mean, and defined as follow ( ) Mad: mad is mean absolute deviation of the sample as ( ( ( )) ).