Sleep stages classification based on temporal pattern recognition in neural network approach

Several previous researchers in sleep stages classification often considered that sleep stages were independent events. They have assumed that every epoch in sleep stages is independent. By nature, sleep is a sequence process so that the current sleep stages will affect to the next sleep stages. Ten datasets of single lead ECG signal from healthy people have been collected. Fifteen features can be extracted from raw ECG signal to describe the sleep stages. Smoothing signal using wavelet denoising is done as the preprocessing steps in order to eliminate noise. Data normalization of input value is also used to handle extreme feature values which will be mapped by activation function in neural network approach. This paper evaluate contribution of temporal pattern in the sleep stages classification result based on fact that sleep stages is a time series data. Multlayer perceptron (MLP) and Time Delay Neural network (TDNN) using standard back propagation algorithm and moment technique are applied to analyze the contribution of the temporal pattern. TDNN is an extended of MLP that the inputs are sequence of current epoch and previous epoch. TDNN as a classifier that can learn temporal pattern has shown better performance than MLP. It shows that temporal pattern takes a part to determine the correct classification result in the sleep stages classification. An appropriate memory long of temporal pattern is required to get the optimal classification result because longer memory cannot guarantee that the classification result is always better.


I. INTRODUCTION
Time series problems are any problems that related to sequence of time. The example of time series problems are predictions of exchange rate on transaction currencies, Arabic recognition [1], speech recognition [2], traffic forecasting in Asynchronous Transfer Mode (ATM) networks [3], etc. Time series problem is often related with prediction problem but the classification problem can also be considered as time series problem because the classification result of some problems are also affected by temporal pattern; it can be called as time series classification. Time series classification is a unique problem because the target class does not only depend on inputs at the current time but also inputs at the previous time. It can be defined that the time series classification consists of two characteristics; the pattern of the current input and pattern of the input through time.
Sleep stages classification is often thought as independent classification problem that temporal pattern is not considered. The several previous works that related with independent sleep stage classification are Lewicke et al. [4] in 2008, Yilmaz et al. [5] in 2010 and Bsoul et al. [6] in 2010. Lewicke  By nature, sleep is a process that runs through time. It means that a sleep stage is influenced by the previous sleep stage and influences next sleep stage. The temporal pattern of sleep stages will be investigated in sleep stages classification using neural network approach. As an extension of multilayer perceptron, time delay neural network has a particular design to solve the time series classification problems. The performance of time delay neural network and multilayer perceptron will be compared as a representation of classifiers which consider temporal pattern or not.

A. Sleep Stages Dataset
Only doctor who is licensed as a sleep expert can annotate sleep stages. The sleep expert analyzes at least using three biological signals; i.e. brain activity (electroencephalogram, EEG), eye movements (electroocculogram, EOG), and chin muscle activity (electromyohram, EMG). This is a standard technique to determine the sleep stages. In this research, the only single ECG (Electrocardiogram) signal is used because the previous works [4,5,6] [7]. The ECG signals from ten volunteers have been collected with 200Hz sampling rate. This datasets are coded as slp0, slp1, …, slp9. The distribution of sleep stages of the ten volunteers are shown in Table 1.
The preprocessing step is used to remove the noise from the acquisition process so that the signals become smoother. The smoothing process is applied to the datasets using stasionary wavelet transform with coiflet mother wavelet, four level decomposition, and universal hard thresholding method [8]. The example of denoising process is shown in Fig. 1.

B. Features
The previous study [9] has shown that the use of features derived from raw ECG signal is better than the features derived from RR interval and EDR.

C. Data Normalization
The values of the features in real life data may have different range. The data may also get the extreme value (i.e. extremely low or extremely high). Data normalization will be an important process when using neural network approach. Neural network uses activation function (e.g. sigmoid function) that makes input data is mapped into specific range with a particular pattern. If the input data are very low than all of the data will be mapped into around 0 and if the input data are very high than all of the data will be mapped into around 1.
Equation (1) is the normalization formula. x is the data, x' is a new value of x, min is the minimum value of a set of the data, max is the maximum value of a set of the data, min' is a new minimum bound of the data, and max' is a new maximum bound of the data. min' and max' are set to 0 and 1 respectively.

III. METHODOLOGY
To evaluate the influence of temporal pattern, the performance of multilayer perceptron and time delay neural network are compared. The number of neuron is predefined; they are 15 neurons and 1 bias neuron in the input layer, 20 neurons and 1 bias neuron in the hidden layer and 5 neurons in the output layer. The number of neuron in hidden layer is calculated in straight forward manner as a summation of the number of neuron in input layer and the number of neuron in output layer.

A. Multilayer Perceptron (MLP)
Multilayer neural network that uses single hidden layer was introduced by Funahashi in 1989. It has been proved to approximate all nonlinear functions with the desired accuracy [10]. It can be called as multilayer perceptron (MLP). The structure of MLP consists of three layers; input layer, hidden layer and output layer as shown in Fig. 2. The output of each neuron in hidden layer and output layer will be calculated using a particular activation function. The activation function that used is sigmoid function. Sigmoid function maps the input into 0 to 1 range value so that it works well in case of 0/1 final decision in output layer. It also can be derived so that it can be used in gradient descent based training methods. The equation of sigmoid function shown in (2).
The illustration of MLP to classify sleep stages is depicted in Fig. 3. Every epoch is treated independently. The sleep stage in an epoch is influenced by only features in the epoch.

B. Time Delay Neural Network
Time Delay Neural Network (also called Tapped Delay Neural Network) (TDNN) is a neural network that considers temporal pattern [11]. It looks like multilayer perceptron with its input is a sequence of time. The structure of time delay neural network is depicted in Fig. 4. User has to determine a number of backward steps to be the input. With these predetermined backward steps, this kind of neural network can capture temporal pattern.
The illustration of TDNN to classify sleep stages is depicted in

C. Learning Algorithm
The standard back propagation algorithm [12,13] is used as the learning method for multilayer perceptron and time delay neural network. This algorithm consists of two major phases: 1. Feed-forward. The input will be fed into the network and we can get the output value using weighted sum function as in (3). net o is the output of the o-th neuron in a layer, f is the activation function, net h is the output of h-th neuron in the previous layer, w ho is weight that connects between h-th neuron and o-th neuron.
2. Back propagation. We calculate gradient of error with respect to a particular weight. The gradient of error is used for weight updating. The each weight in the network is updated according to (4). This technique uses the gradient decent approach.
where w i (t) is weight in time t, w i (t-1) is the weight in time t-1 or the previous weight and the Δw i (t) is calculated as a learning rate (η) times the partial derivative of error function as shown in (5). The error function is square error according to (6).
O k is the output of the network and d k is the desired output.
The momentum is also used to enhance the gradient decent approach. This technique can reduce possibilities to be trapped in local minimal and also increase the learning speed [14]. The momentum (α) is added in the (5) according to (7).
is the previous updated weight factor.
Back propagation algorithm can be instantly used as learning algorithm for TDNN. TDNN can be considered as MLP with n set of input layer, where n is the number of delay. It is clear that TDNN is an extended of MLP. The equation to get the output value of hidden layer is according to (8). net h is the output of h-th neuron in hidden layer, f is the activation function, net dh is ouput of i-th neuron in input layer and d-th delay, w dih is weight that connects between h-th neuron in hidden layer and i-th neuron input layer with d-th delay.

D. The Best Weight Memorizing
Sometimes the learning process goes to wrong direction. This concept will memorize the set of weight which leads the best performance of the neural network while the training process. The performance is indicated by the value of means square error (MSE). The MSE will be calculated and the set of weight that produces the minimum MSE will be save.

E. Performance Evaluations
Percent correct classification and Kappa statistic are used to measure performance comparison between MLP and TDNN. The percent correct (pc) is calculated using (9). CC is the number of sleep stages which are correctly classified and NC is the total number of sleep stages in a dataset.
Kappa statistic shows the degree of agreement between classification decision and ground truth [15]. The value of Kappa statistic is in range 0 to 0.99. The interpretation of numerical value of Kappa statistic is shown in Table I.   TABLE II. INTERPRETATION OF KAPPA STATISTIC [15].

A. Experiment Scenario
The performance of MLP and TDNN will be evaluated to learn the sleep stages pattern. For each dataset, percent correct and Kappa statistic of MLP and TDNN are calculated. The bigger value of percent correct and Kappa statistic means better.
In this research, experiment is done in two sets. All sets use 12500 learning iteration and sigmoid function as an activation function. Set one uses 0.3 learning rate and 0.2 momentum. Set two uses 0.1 learning rate and 0.3 momentum. The delay parameters of TDNN are 1 delay, 2 delays and 3 delays. The experiment aims to evaluate the influence of time pattern in the classification result. The performance that compared is the result of neural network to learn individually of each subject.

B. Experiment Result
The percent correct and Kappa statistic value using 0.3 learning rate and 0.2 momentum are showed in Table 3 and Table 4 respectively. Table 3 can be illustrated in Fig. 6 and Table 4 can be illustrated in Fig. 7. The percent correct and kappa statistic show corresponding result. It can be clearly seen that the graphic of Fig. 6 and Fig. 7 are similar.    The percent correct and kappa statistic of TDNN always show better result than MLP using 0.3 learning rate and 0.2 momentum. The longer the delay parameter in TDNN shows better performance, but it is not always.
The percent correct and Kappa statistic values using 0.1 learning rate and 0.3 momentum are showed in Table 5 and  Table 6 respectively. Table 5 can be illustrated in Fig. 8 and Table 6 can be illustrated in Fig. 9. The percent correct and kappa statistic show corresponding result. It can be clearly seen that the graphic of Fig. 8 and Fig. 9 are similar. This inference is the same with the previous result (i.e. Fig. 6 and Fig. 7). Fig. 8 and Fig. 9 show that the longer delay parameter than the better performance but it is does not always apply in Fig. 6 and Fig. 7. The learning parameter and momentum absolutely take a part to the performance of the neural network.    Figure 9. Percent corect using 0.1 learning rate and 0.3 momentum.

V. CONCLUSION
The performance of time delay neural network that can learn temporal pattern always shows better accuracy and agreement than standard multilayer perceptron. The longer delay parameter means that TDNN can memorize longer temporal pattern but cannot give guarantee that the performance is always better. The performance of TDNN is also based on the parameters of learning algorithm, such as learning rate and momentum. Based on the experiment result, the sleep stages classification has been proven as a time series problem so that the temporal pattern will significantly influence in the classification result.