Python (deep learning and machine learning) for EEG signal processing on the example of recognizing the disease of alcoholism

Alcoholism is one of the most common diseases in the world. This type of substance abuse leads to mental and physical dependence on ethanol-containing drinks. Alcoholism is accompanied by progressive degradation of the personality and damage to the internal organs. Today still not exists a quick diagnosis method to detect this disease. This article presents the method for the quick and anonymous alcoholism diagnosis by neural networks. For this method, don't need any private information about the subject. For the implementation, we considered various algorithms of machine learning and deep neural networks. In detail analyzed the correlation of the signals from electrodes by neural networks. The wavelet transforms and the fast Fourier transform was considered. The manuscript demonstrates that the deep neural network which operates only with a dataset of EEG correlation signals can anonymously classify the alcoholic and control groups with high accuracy. On the one hand, this method will allow subjects to be tested for alcoholism without any personal data, which will not cause inconvenience or shame in the subject, and on the other hand, the subject will not be able to deceive specialists who diagnose the subject for the presence of the disease.


Introduction
According to the World Health Organization, in recent decades the number of patients with alcoholism grew.Research shows that alcohol abuse is associated with behavioral disinhibition, but the neurophysiological mechanisms governing these relationships remain largely unknown.For these reasons, the diagnosis of this disease is difficult.The disease can be identified by the many symptoms.Anuragi et al. [1] and Onarom et al. [2] described the biological processes that occur in the brain during drinking.Ishiguro et al. [3] and Kumar [4] described the physiological consequences of the long-term intake of drinks containing alcohol.These articles showed the complexity of the process operation of the brain during illness and the complexity of diagnosing the presence of this disease.For accurate diagnosis, this disease for specialists needs many private information about patients.But not all patients want to be diagnosed openly.Therefore, the purpose of the research to develop an anonymous method for classifies the alcoholic and control groups by neural networks with the EEG signals dataset.Today, medicine has stepped far enough in this direction.Winterer et al. [5], Patidar et al. [6] and Acharya et al. [7] provided an overview of the EEG signals of patients diagnosed with alcoholism.
There is enough information in these works to understand the situation in the field of detection of alcoholism by the EEG signal.But interestingly, despite the seeming knowledge of this issue in the EEG field, many papers on this research have conflicting results.Jeremy et al. [8], showed that compared with men, women are at increased risk of negative physical and neurocognitive correlates of alcohol consumption.In research proved that alcohol abuse has a detrimental effect on the dynamics of EEG suppression of the reaction in the theta range.The opposite conclusion is in the work by Ahmadi et al. [11].Ahmadi decomposed the EEG signal is into five frequency subbands using the wavelet transform.He showed that there is a lower synchronization in the subband of beta frequencies and a loss of lateralization in the sub-band of alpha frequencies in alcoholic subjects.Ahmadi realized classification by machine learning algorithms.But in the research, deep neural networks were not used.But, Paulchamy et al. [15] used all threshold alpha, beta, and theta waves to detect this disease in subject.Ziya et al. [9], implemented software in the Matlab program for classifying by EEG indicatoralcoholic or control subject was presented.The research did not present the result of signal preprocessing.The article does not have enough information about the neural network model.Wajid et al. [10] used EEG data to extract EEG characteristics such as absolute power (AP) and relative power (RP).The classification accuracy of the model is not high.Guohun et al. [12], showed that the areas with electrodes -C1, C3, and FC5 for alcoholic's groups are significantly different.Mingyue et al. [13] present a new algorithm for analyzing an EEG signal.Mingyue calculated the distinguish non-linear EEG characteristics with alcoholics and controls by the exponential strength Ratio Index (EPRI).But in the research deep neural networks were not used.Joel et al. [15], for the diagnosis of alcoholism by EEG extracted features from four-minute records of EEG of the scalp with eyes closed.In finally the influence of age and gender on the diagnosis of alcoholism was researched.Madhavi et al. [16] noted that increased absolute theta strength in patients with alcohol dependence in all areas of the scalp.Also, Madhavi considered the increase in theta login power in male alcoholics in the central and parietal regions.Anuragi et al. [17] showed that chronic alcoholism is associated with a high frequency of low-voltage recordings.

Materials and method
In this manuscript, the dataset from Henri Begliter (Laboratory of Neurodynamics at the Center for Health at New York State University in Brooklyn, presented publicly, https://archive.ics.uci.edu/ml/datasets/eeg+database) was used.This dataset from the research of genetic predisposition to alcoholism.In experiment 64 electrodes placed on the scalp were used (frequency of reading signal of 256 Hz).Two groups of subjects: an alcoholic and a control group were involved in the experiment.Each subject to either one stimulus (S1) or two stimuli (S1 and S2) was subjected.S1 and S2 are a set of images of objects selected from the set of images of 1980 Snodgrass and Vanderwart.The dataset has the following structure, 480 tables in format -csv for training and 480 tables for verification.The table shows 64 electrodes.Each electrode has 256 records with a duration of 1 second, table.1.

Table. 1. The structure of the data file for training -Data1, test
The table contains the following information: sensor position, sensor value (µV), subject identifier (Alcoholic(a) or Control (c)), matching condition, name(a serial code assigned to each subject), time(inverse of sample num measured in seconds)) The location of the sensors on the head is shown in fig. 1.The ratio of the electrodes on the head and the X-axis (fig.2) is presented in table 2. Table 2. Location of the position of the electrodes on the X-axis for Fig. 1 This dataset has some artifacts.In fig. 2 -c there is a sudden increase in tension, which can be caused by eye movement or blinking.To exclude this kind of artifact, the principal component method was used.The method of principal components is a multidimensional statistical analysis method used to reduce the dimension of the feature space with minimal loss of useful information.
The potential of electrooculography (EOG) is one of the most popular artifacts that occur with eye movement.In this case, the maximum amplitude of artifacts is observed in the frontal leads and decreases towards the occipital leads.In the next researches for [18,19,20] to remove artifacts caused by involuntary eye movements of the subject from a multi-channel EEG, a wide analysis of the main components is used.For these artifacts, it is very difficult to visually find regularity in the presented figures (Fig. 2).Therefore, it is advisable to try neural networks and machine learning for EEG signal recognition.

Wavelet transforms
We considered the most popular methods of signal preprocessing -wavelet transform and decomposition into a fast Fourier series.In many researches, the fast Fourier transform is used in conjunction with the wavelet transform [25,26,27,28,29].
The wavelet transform carries a huge amount of information about the signal, but, on the other hand, has a strong redundancy, since each point of the phase plane affects its result.A continuous wavelet transform is defined as the scalar product of the original signal x(t) and the daughter wavelet function ¥ , (): where W(τ,a) -wavelet expansion coefficients; τ, a -пparameters of time shift and scale, respectively; operator * means complex pairing.Child wavelet functions ¥ , , formed by shear and scale operations of the mother wavelet function ¥  and related to it by the ratio: The complex Morlet wavelet, which is the product of a complex sinusoid and a Gaussian, is used as the mother wavelet function.The analytical expression of the Morlet wavelet has the form: where ω0maternal wavelet center frequency; σstandard deviation of the envelope of the mother wavelet.

Fast Fourier Series
The continuous Fourier transform, and the discrete Fourier transform have not found wide application in the process of extracting attributes due to their low efficiency, which was explained in the next articles [30,31,32,33,34].The most popular is the decomposition of the signal into harmonic components using the Fast Fourier transform.
For the signal x (n), presented in the form of a sequence of samples, taken with sampling frequency Fs, time moments with numbers n = 0,1, ..., N-1, the discrete Fourier transform is defined as: The received image dataset will be used as input data to the neural network.

Experimental research 3.1 Machine Learning
The use of machine learning in classification tasks today is becoming less popular due to the development of deep neural networks.But in the next papers, results with high accuracy with machine learning were obtained [21,22,23,24].For this reason, we tried using machine learning.In our research for machine learning tasks, the available dataset of 480 excel files was converted to a single file, of the following form, table.3

Table. 3. Type of machine learning dataset
Algorithms Logical Regression, Naive bayes, k-Nearest Neighbors, Support_Vector_Machines shows results of approximately 0.50 accuracy.Algorithms Random Forest Classifier -0, 75 accuracies.The maximum result was obtained with the use of Classification and Regression Trees (CART) -0.81 accuracies.

Deep neural networks
To increase accuracy, we decided to use deep neural networks.We decided to use a convolutional neural network (CNN) that works with images.Figure 5 shows the two-dimensional (2D) graphics data correlation (Python3.7,matplotlib).Image analysis shows a high correlation between regions that are close to each other.Visually we noticed that brain regions show different correlation values between subjects for the following regions PO3-CPZ and F4-C4.Visual observations allow us to conclude that the image data can be used for deep machine learning in the classification task.Today, the following CNN can be used in the EEG signal classification process [35,36,37,38], fig.6.The results of the classification accuracy in % when using various input images are presented in table 4. For the developed CNN model, the higher accuracy for classification objects between alcohol and not alcohol objects when we use Correlation EEG images was received.

Discussion and conclusions
The highest accuracy result in classification was obtained using an image with the correlation of signals.The next areas PO3-CPZ and F4-C4 have the highest correlations.
From the frequency range, a high classification result when working with the Beta range was obtained.The delta range showed the lowest result, which is associated with the loss of the useful signal in the original frequency.Using images with the Fourier series has accuracy commensurate with accuracy for the machine learning algorithm.In this research, there is no direct pattern between the magnitude of the voltage across the electrode and the group of studies.In 72% of cases, it is observed that in the group of alcoholics the voltage on the electrode is lower than in the control group.But, the location of the electrodes is different and there is no way to establish an exact relationship between the magnitude of the voltage on the electrode and the presence of the disease.In many papers, the beta, alpha, and theta rhythms of the EEG signal were used for classification signals.In this research, it is shown that with the same type of data about the object under research (only electrode voltage), it is preferable to use convolution networks with images of the correlation of EEG signals.
For correct research in the field of analysis of EEG data using neural networks, it is necessary to submit as much data as possible from studies: age, gender, medical history, etc.Much research in the field of alcohol recognition by EEG signals has different results.In order to avoid it, it is necessary to develop a standard in the field of using neural networks regulating the number of signs for neural networks for classifying an alcoholism disease in a subject.

4 .
where F (k) is the complex amplitude of the sinusoidal signal with a frequency k * △ f, △ f = Fs / N resolution (step) in frequency, x (n) are the measured signal values at time instants with numbers n = 0,1, ... .N -1.The result of expanding the signal into a Fast Fourier series with the scipy.fftpacklibrary is shown in figExpanding the signal into a Fast Fourier series.a -1.alk_s1, b -2.1alk_s2, c -3.not_alk_s1, d -4.not_alk_s2

Fig. 6 .
Fig.6.Popular deep neural networks for image classification

Fig. 7 .
Fig.7.Schematic representation of a neural network when working with correlation graphs , Fig.2

Table 4 .
Accuracy of CNN model for different dataset