Genetics-B ased Machine Learning for the Assessment of Certain Neuromuscular Disorders

Abstruct-Clinical electromyography (EMG) provides useful information for the diagnosis of neuromuscular disorders. The utility of artificial neural networks (ANN’S) in classifying EMG data trained with backpropagation or Kohoiien’s self-organizing feature maps algorithm has recently been demonstrated. The objective of this study is to investigate how genetics-based machine learning (GBML) can be applied for diagnosing certain neuromuscular disorders based on EMG data. The effect of GBML control parameters on diagnostic performance is also examined. A hybrid diagnostic system is introduced that combines both neural network and GBML maldels. Such a hybrid system provides the end-user with a robust and reliable system, as its diagnostic performance relies on more than one learning principle. In the clinical EMG laboratory, 680 motor unit action potentials (MUAP’s) were collected from 12 normal, 11 motor neuron disease, and 11 myopathy subjects. Eight subjects from each group formed the training set, and the other 10 subjects formed the evaluation set. Each subject was described by a 14-element feature vector consisting of the mean and the standard deviation of each of the following MUAP parameters: duration, spike duration, amplitude, area, spike area, phases, and turns. More than a thousand GBML models were developed by varying the following parameters: message length size (49, 74), number of classifiers (100, 150, 200, 250, 300, SOO), lifetax (0.000, 0.002, 0.005, 0.010), period of genietic algorithm (GA) introduced, which is expressed in iterations, showing how often the classifier system calls the GA (50, 100, 200, 500), crossover probability (0.5, l.O), and mutation probability (0.00, 0.01, 0.02). A total of 28 models were selected that achieved a diagnostic yield better than 95% and 70% for the training and evaluation


I. INTRODUCTION
LECTROMYOGRAPHY (EMG) is the recording and E' study of the electrical activity of voluntary contracting muscles. In humans, clinical EMG findings provide useful information in the electrodiagnostic examination of peripheral nerves and skeletal muscle, and in deciding the level of the lesion in patients suffering from neuromuscular disorders. EMG is also particularly helpful in deciding whether the symptom of muscle weakness in the assessment of neuromuscular disorders is myopathic or neurogenic in origin. It should be emphasized, however, that EMG findings evaluated alone cannot be used for providing a clinical diagnosis because they do not show any specific reason that can cause disease [l].
In the last two decades there has been an attempt to improve the objectivity and accuracy of EMG analysis. Advances in computer technology and digital signal processing provided a basis for computer-aided EMG feature extraction [2]. This allows measurements to be standardized, to be more accurate and save diagnostic time. There is now a need to add decision making capabilities so that all the data can be processed in an integrated environment. The advantages of automated EMG diagnostic systems can be summarized [3]: Standardization. Diagnoses obtained from different laboratories using similar criteria can be verified. Sensitivity. EMG findings on a particular subject may be compared with a database of normal values and/or a decision can be made by an automated diagnostic system deciding whether or not an abnormality exists. SpeciJicity. Findings may be compared with databases for various neuromuscular diseases andlor a decision can be made by the automated diagnostic system with respect to the type of abnormality. EquivaZence. Results from a series of examinations of the same patient may be compared to decide whether there is evidence of disease progression or of response to treatment. In addition, the findings of different automatic diagnostic methods can be compared to determine which are more sensitive and specific. ESJicacy. The results of different treatments can be more properly evaluated. Different approaches have been used to address the problem of automated EMG diagnosis. Knowledge engineering [4], causal probabilistic networks [5], artificial neural networks [61, [7], and other methods have been investigated. The usefulness of genetics-based machine learning (GBML) in EMG data classification was discussed in a recent pilot study [SI. The aim of this investigation is to further examine how GBML models can be applied in the diagnosis of certain neuromuscular disorders based on EMG data, and explore the possibility of adopting a synergistic hybrid system based on both neural networks and GBML. The hybrid system tries to mimic the examination procedure where more than one expert physician can independently provide their diagnosis on a case, given the same information. The diagnostic performance and the learning behavior of the models is examined with respect to the following GBML parameters: message length, number of classifiers, lifetax, period of genetic algorithm introduced, crossover probability, and mutation probability. A description of these parameters is given in Section III. Furthermore, the diagnostic performance of GBML models is examined as the EMG sample size is reduced.
Of the various GBML systems, this investigation focuses on the simple classifier system (SCS) documented by Goldberg [9]. The SCS is a parallel production system designed to exploit the implicit parallelism of any genetic algorithm. In the SCS, interactions are implemented through standardized messages, whereas conditions are simply defined in terms of the messages they accept. Actions are defined in terms of the messages they send. The resulting system uses a simple syntax that makes it easy for a genetic algorithm to discover building blocks appropriate for the construction of new strings (rules). The SCS relies on competition to identify better rules. New rules can be introduced into the system, as hypotheses, without disturbing the existing status of the rules within the system. Goldberg [9] and Holland [IO] indicate that this gracefulness makes it possible for the system to operate incrementally, testing new structures and hypotheses while steadily improving its performance.
Classifier systems are a type of traditional production expert systems [9], [ll]. The performance of such systems is enhanced in that more than one production rule can fire simultaneously, and because the representations and operations of the system are based on simple syntactic matches. Classifier systems share many important features with connectionist or neural-network representations [ 1 I]. Both of these abstract models call for massively parallel computation and support subsymbolic models of cognition. These learning systems are derived bottom-up, that is directly from low-level representations of the sensory interface, rather than top-down from semantically meaningful symbols [ 1 I].
In the following section, EMG methods and materials are presented. The pathology of neuromuscular disorders of interest is briefly discussed and EMG feature extraction is introduced. Section I11 presents the GBML paradigm, and the ways it was implemented for EMG diagnosis. Section IV presents the results of different GBML models. The last section provides an overall discussion.

EMG METHOD AND MATERIAL
The motor unit is the smallest functional unit of the muscle. It consists of an anterior horn cell, an axon, and the muscle IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 7, NO. fibers innervated by the neuron. Structural reorganization of the motor unit takes place in neuromuschlar disorders that affect the peripheral nerve and/or muscle. Motor unit morphology can be determined by recording its electrical activity using different types of needle electrodes. The concentric needle electrode, measures an electrical potential difference between the bare tip of an insulated wire, usually platinum, and the bare shaft of a steel cannula through which it is inserted. This electrode picks up the electrical activity of a fraction of the motor unit where, at a slight voluntary contraction, motor unit action potentials (MUAP's) are recorded. In this study, the EMG was recorded from the biceps brachii muscle for five seconds. MUAP's were identified and selected from the EMG recording based on predetermined criteria. A parametric pattern recognition algorithm based on MUAP features was applied for recognizing similar MUAP's generated from the same motor unit [7]. Features measured automatically from each group of similar MUAP's ( Fig. 1) include: duration (Dur), the beginning and ending of the MUAP are identified by sliding a measuring window of 3 ms in length and 510 pV in width; spike duration (SpDur), measured from the first to the last positive peak; amplitude (Amp), the maximum peak to peak measure of the MUAP; area, the rectified MUAP integrated over the duration; spike area (SpArea), the rectified MUAP integrated over the spike duration; phases (Ph), the number of baseline crossings that exceed 25 pV, plus one; turns (T), the number of positive and negative peaks separated from the preceding and following peak by 25 pV.

Dur
In quantitative EMG studies, it is appropriate to record 20 sets of similar MUAP's from the muscle of each subject; see Table I. This is considered an acceptable sample of the whole muscle [12]. The mean (mn) and the standard deviation (sd) for the 20 values of each parameter are also shown. The quantitative features that describe the subject are thus reduced to a 14-element vector to be classified for the sake of diagnosis.
This investigation was limited to the diagnosis of certain neuromuscular disorders based only on EMG findings. These include disorders that cause muscular weakness andor wasting (loss of muscle fibers). From the large number of such disorders only two groups were considered: motor neuron disease (MND), and myopathy (MYO). These two categories of MND and MY0 were selected because the former is purely a disorder of the motor neuron while the latter is purely a disorder of the muscle fiber per se. In trying to develop the GBML computer-aided diagnostic system, it was felt that it would be best to have two pathologically distinct and contrasting categories of neuromuscular disorders that could eventually be diagnosed accurately on clinical grounds. Furthermore, the biceps brachii muscle was examined because it is a proximal muscle of the shoulder girdle that is usually affected at an early stage in both MND and MYO. Also, its easy accessibility has made it attractive to study and widely reported in the literature. In this study, 680 MUAP's were collected from 12 normal, 11 MND, and 11 MY0 subjects. The 14-element feature vector describing each subject consists of MUAP measures that are widely uised in the everyday practice of clinical EMG. In addition, these features are easily understood by the physician, and they have also been proved to describe well the motor unit structural changes as affected by disease. A brief discussion of the EMG findings in the NOR, MND, and MY0 groups for the material under study is given.
Normal (NOR): the mean and standard deviation values computed for duration, amplitude, and number of phases for 240 MUAP's for this group were 9.60 f 2.75 ms, 0.376 f 0.306 mV, and 2.6 f 0.8, respectively.

Motor Neuron Disease (MND)
: is a disease causing selective degeneration of the upper and lower motor neuron. This disease affects middle-aged and older people. There is progressive widespread loss of motor neurons, usually leading to death within three to five years. In the advanced stages of this disease large motor units also denervate. Typically the mean durations of the motor unit potentials are longer than normal and there are increased amplitudes. There is an increase in the number or density of fibers in the motor units, or an increase in the temporal dispersion of the activity picked up by the recording electrode. The latter effect is the result of slowed conduction along the terminal branches of individual nerve fibers, increase in the end-plate zone, or both. Mean and sd duration, amplitude, and phases for 220 MUAP's for this group were 13.4 f 3.86 ms, 0.614 f 0.426 mV, and 4.0 f 1.8,

respectively.
Myopathies (MYO): are a group of diseases that affect primarily skeletal muscle fibers. They are divided into two groups, according to whether they are inherited or acquired. Most muscular dystrophies are hereditary, causing severe degenerative changes in the muscle fibers. In this group of diseases, there are four main types of muscular dystrophy, namely Duchenne's, Becker's, fascioscapulohumeral, and limb girdle. They show a progressive clinical course from birth or after a variable period of apparently normal infancy. A frequently acquired myopathy is polymyositis. This is characterized by acute or subacute onset, with muscle weakness progressing slowly over a matter of weeks. MUAP's with short duration and reduced amplitude are typical findings in patients suffering from myopathy. These findings are attributed to fiber loss within the motor unit, with the degree of reduction of these parameters reflecting the amount of fiber loss [13]. The mean and sd of MUAP duration, amplitude, and phases for this group of patients were 7.15 f 2.34 ms, 0.314 f 0.250 mV, and 2.7 f 1.0, respectively. Twenty-four of the above 34 cases were randomly selected to form the training set (eight subjects from each group). The remaining 10 subjects were used for evaluating the performance of the models after their training. Mean duration of normal subjects varies from 8-12 ms, and mean amplitude varies from 0.280 to 0.520 mV. Myopathy patients usually have MUAP's with short duration and low amplitude, whereas MND patients have MUAP's with long duration and high amplitude, but no clear boundaries enclosing each group can be drawn.

A SIMPLE GENETICS-BASED MACHINE LEARNING CLASSIFIER SYSTEM
The classifier system contains three main components: the rule and message system, the apportionment of credit algorithm, and the genetic algorithm, as shown in Fig. 2.

A. Rule and Message System
Data are decoded to a standard length message the environmental message length and is positioned on a message list where the message can then activate string rules called classifiers. Classifiers are listed in the classifier store where a classifier is a production rule which is a simple string that consists of a condition and a message: Classifier = Condition: Message. The message of the classifier is a string of finite length of a certain coding system. The condition of the classifier acts like a pattern recognition device with a wild card character added to the coding system. With reference to the binary system, the message is composed of n codes that are either "0's" or "1's." The product could be a string of the same length n as the message, but composed of "O's," "1's"' and the wild card symbol #, which means that, at the specific part of the string, the code could either be a "0" or "1." For example the classifier could be 01#1:0010, where 01#1 is the condition and 0010 is the message. In this example the condition would be matched by both strings 0101 and 0111.

B. Apportionment of Credit Algorithm
When a message matches a condition (of a classifier in the message list it may then be sent as the output. Matching classifiers, however, do not send their messages directly to the message list but they instead participate in the auction based on their strength values. The string with the highest strength wins, but before it proceeds with posting its message, it has to pass through the clearinghouse. At this stage, the winner is taxed by removing 10% of its current strength. The third routine in the apportionment of credit algorithm is the tax collector. The objective of this routine is to discourage nonproductive classifiers. For this purpose, two different types of taxes are collected, an existence tax and a bid tax. The existence tax is assessed and collected from all classifiers at a tax rate specified by the variable lifetax. The bid tax is assessed and collected from all classifiers that bid in the last auction, specified by the variable bid tax. Thus the three procedures within the apportionment of credit algorithm distribute and collect taxes in trying to ensure that good rules receive higher strength whereas bad rules receive lower strength [9].

C. Genetic Algorithm
The performance of the classifier systiem may be enhanced by introducing a genetic search for new, possibly better rules through the call of the genetic algorithm (GA). The GA is called to work only on a selected proportion (ps). Selection is based on the strength value of classifiers. This is used as their fitness value during the operation of the GA. The aim of the introduction of the GA is not to change the character of the "natural" data but to amplify some [characteristic features through optimization. The parameter period of GA introduced specifies the number of time steps the GA is called by the classifier system. This algorithm is composed of three operations: reproduction, crossover, and mutation [9]. During the reproduction phase, classifier strength values are used as their "fitness values." The strings of higher fitness are called into the mating pool and are paired depending on their values. The crossover takes place between the pairs of data strings, in a way similar to crossover between the: pairs of homologous chromosomes during meiosis. This leads to the production of two new strings carrying partial information from both parent strings. Their fitness values will be the average of parent fitness values. Mutation is another operator of the genetic algorithm that is introduced at very rare intervals to change a character of data strings, chosen at random, to any of the other characters of the code. Crossover and mutation probabilities express the probabilities of crossover per mating event, and mutation per bit change, respectively. The resulting population of new strings will follow the same genetic procedure for several generations.

D. An Example of a Learning ClassiJier System
Mean MUAP parameters of a normal subject are decoded to binary, with the environmental message length being 26. A message could belong to any of three classes, 0 = NOR, 1 = MND, and 2 = MYO. The message is then posted to the message list, to be processed through the classifier store. The apportionment of credit algorithm is called, wherein the auction process matching classifiers are selected (R. 1 and R.4), and the classifier (R.l) with higher strength (9.01) wins. Then, in the clearinghouse operation, the matching classifiers' strength value is reduced by 10%. The third operation is the tax collector. A small fraction of the strength value, specified by the parameter lifetax (0.2%) is deducted from all classifiers. In addition, the strength value of matching classifiers is reduced by 1% as given by parameter bidtax.
The classifier store periodically calls the GA. Within the GA, the reproduction process selects a few classifiers, two in this example, that have the higher strength (R.l and R.2). Then, in the crossover operation these strings are crossed with the new strings taking the average strength value of the parent strings. In this example the crossover position and probability are 10 and one, respectively. In the mutation process, a character of the code is changed to any of the other characters of the code with equal probability. As shown in Fig. 2, position 18 of string R.l is changed from zero to one. The learning process continues for the specified number of epochs.

IV. RESULTS
The mean and the standard deviation of the seven MUAP parameters form the 14-element feature vector that describes each subject. Each vector has been decoded into 49-and 74-bit strings (environmental messages) as shown in Table 11. The 49-bit string decoding scheme is considered to provide the minimum acceptable resolution required by the physician in reaching a conclusive diagnosis, whereas the 74-bit decoding scheme represents the resolution that MUAP parameters are usually measured by the EMG machine. GBML models were investigated by varying the values of the parameters, environmental message length (size), number of classifiers (cl), lifetax (Ztaz), period of GA introduced (TGA), crossover probability (pc), and mutation probability ( p m ) , see Table 111. Over 1000 models have been investigated by combining the values of the above parameters. Three different sets of data, with MUAP sample size 20, 10, and five, were also used for training. The stopping criterion of learning for the investigated models was set at 10 000 iterations. By the end of training the input data was applied to the classifier system 416 times or epochs (10000/24). The diagnostic performance of models for the training and evaluation data sets are expressed as a percentage of correctly classified cases by TR (training set) and EV, respectively. For example, EV is equal to 60% if six of the 10 subjects in the evaluation set are classified correctly by the model.

A. Selected GBML Models
The 28 GBML models, shown in Table IV, were selected from all those investigated, by applying the following criterion: (MUAP sample size equal to 20) AND (diagnostic yield for TR at least 95%) AND (diagnostic yield for EV at least

B. Effect of MUAP Sample Size
Quantitative MUAP analysis requires 20 MUAP's to be collected per muscle so as to allow the physician to draw conclusions regarding any underlying pathology. In this section, an attempt was made to investigate the diagnostic performance of GBML models supplied with the 14 element feature vector that resulted from only 10 MUAP sample sizes rather than the 20. For this purpose the mean and standard deviation of the seven parameters of the first 10 MUAP's for each subject were computed. The same exercise was repeated by selecting only the first five MUAP's. As shown in Table N

C. Effect of the Environment Message Length
The environment message length is derived by deciding the range of each parameter and the decoding procedure. The longer the length of the environment message, the better the resolution, and thus one would expect to see significant improvement in the performance of a model with greater size. In this study, as shown in Table IV, clearly models with length 74 performed better than those with length 49; the improvement, however, is not as good as one would expect to see. This result suggests that the size of 49 bits was sufficient for accommodating the complexity of the feature vector. Table V illustrates the effect that cl has on the diagnostic yield for size = 74, ltax = 0.002, p , = 0.5, pm = 0.01, and MUAP sample size = 20. This table shows that both the training and evaluation performance improved as the number of classifiers increased until it reached the value of 500. TGA has little effect on the diagnostic performance of the models; however, for TGA = 500, cl = 300, and 500, a 20-30% increase in EV was obtained. Similar findings were also obtained when the following GMBL parameters were changed to ltax = 0.000, p c = 1.0, and pm = 0.02.

E. Effect of Lifetax
The effect of lifetax is shown by the performance of the models in Table VI. By increasing the ltax the TR is reduced,  whereas the EV remains at the same levels. This finding suggests that during the training phase :several classifiers are lost because of the lifetax penalty, thus reducing the diagnostic yield during this phase. There is no effect on the diagnostic yield of the evaluation data as no lifetaut is deducted during the evaluation phase. Besides, it is shown that Ztax has no significant effect on the quality of the resulting model. The significance of the ltux is important, however, since it makes the model more robust for sustaining temporal changes. These finding were also obtained by studying models with classifier

F. Effect of the GA
In this exercise the parameters of the GA, T G A ,~~, and p , are investigated. Period of GA Introduced: Table VI1 shows that the models which called the genetic algorithm less frequently (every 500 epochs than every 50 epochs) produced better results. This finding suggests that the GA should not be called upon very frequently because it causes drastic changes to the classifiers, at a higher rate than the system can accommodate. The 'introduction of the GA is needed for generating new classifiers that will hopefully make a good match with some inputs. If i the rate of introduction is high there is a high probability that many good classifiers are replaced by offsprings that are not as good as their parents.
Crossover: Table VI11 shows that the models with pc = 1.0 have an overall better performance with respect to the diagnostic yield than the models with p c = 0.5. The high value of the crossover probability was suggested also by Goldberg [9]. The above holds as long as the period of introducing GA, TGA, is kept constant.
Mutation, p, : The probability of mutation was kept at low rates compared to pc following 191. The effect of this parameter is illustrated in Tables VI1 and VIII. Table VI1 shows that the best model is that with TGA = 500 and p , = 0.02. Table VI11 shows that for p , = 1.0 EV increases as p , increases.

G. Learning Pe~ormance
Learning performance curves of selected models are shown in Figs. 3 and 4. Each graph shows the percentage diagnostic yield of the TR, against the number of iterations. The percentage yield calculated at each iteration, is the cumulative value of correctly identified subjects. Fig. 3 shows a set of learning curves that corresponds with models having values pc and p,. the value of TR reaches 78% of success within 500 iterations.
Learning then continues at a slower rate:, with TR reaching values within the range of 88 to 95%. It was also observed that faster learning is exhibited for all of the presented models when the lifetax is increased to 0.002. TR reached values of the order of 84% within 500 iterations. This rate of faster learning for GBML models with ltux = 0.002 continues until the number of iterations reaches 1500. Thereafter, TR levels off to the same values as those shown in Fig. 3. Fig. 4 shows the learning performance of GBML models with size = 74, cl = 500, TGA = 200, and ltax = 0.000. It is shown that faster learning was obtained as compared with Fig. 3. In Fig. 4 at 500 iterations, TR was 87%. It is shown that further training resulted in higher diagnostic performance compared to Fig. 3. When the value of ltux was increased to 0.002, a slightly better diagnostic performance was obtained as compared with that of models in Fig. 3. As shown in Figs. 3 and 4, varying p , and pm causes no significant change in TR that can suggest any correlation with the learning behavior of the models under study.

H. Strength Distribution
The strength distribution at the end of training is shown for models with size = 49, cl = 200, TGA = 100, p , = 1.0, p , = 0.02, and ltax = 0.000, and 0.002 in Figs. 5 and 6, respectively. Fig. 5 shows that 38.5% of the classifiers had a value of 10, that is, they represent the rules that have to be discarded. Fifty- eight percent of the classifiers take strength values between six and nine, with only 3.5% having strength less than or equal to five. The classifiers that match the MY0 group are almost twice as many as those matching the MND and NOR groups, that is 30.5%, 13.5%, and 14%, respectively. In Fig. 6, 42% of the classifiers resulted in zero strength. In this model, for strength values between one-four, 25.5%, 16%, and nine percent of the classifiers were distributed to the NOR, MND, and MY0 groups, respectively.
Summarizing the results of this section, it can be concluded that: 1) approximately 60% and 20% or 140 and 100 classifiers survived the learning process for the investigated models with cl = 200 and 500, respectively, and 2) with ltax = 0.000 productive classifiers have strength with ascending distribution between six-nine, whereas with ltux = 0.002 productive classifiers have strength with descending distribution in the range   of one-three. All classifiers, irrespective of their matching success, have to pay lifetax. This penalty results eventually to the "extinction" of nonmatching classifiers.

I. A Hybrid EMG Diagnostic System
A hybrid EMG diagnostic system was built incorporating selected models trained with the backpropagation algorithm [ 141, the self-organizing feature maps algorithm [ 151, and the GBML classifier system. This hybrid system is based on the following modules: INPUT: Composed of the 14-element feature vector. CLASSIFIER SYSTEM: Composed of both supervised and unsupervised modules as follows: Supervised trained EMG module: * three models trained with the GBML paradigm 0 three models trained with the backpropagation algorithm (Table IX); and ( OUTPUT: The output from each of the nine models is fed into the diagnostic assessment module. The performance of the system is expressed as a string, indicating the number of models that classified a certain subject under investigation as NOR, and/or MND, and/or MYO. Neural network backpropagation models and self-organizing feature maps gave similar diagnostic performance of the order of 80%. Two of the backpropagation models shown in Table X achieved EV = 90%. Several backpropagation models with different architectures, gain, and momentum factors were investigated [7]. Models with small architectures, shown in Table X, require more epochs during training, thus are more demanding in computation power (in one epoch, the data of the 24 subjects in the training set are input to the algorithm). For models with bigger architectures, however, the number of epochs and training time are also reduced. For neural network models trained with the Kohonen' s self-organizing feature maps algorithm, computation time per epoch is the smallest compared to the backpropagation and the GBML algorithms. Total training time for the model with grid size 10 x 10 (Table XI) is similar to GBML training of model 3 (Table IX) given in Table IX were trained for 100 epochs (2400 iterations).
The diagnostic performance of the hybrid EMG system was tested using the data of the evalnation set, as shown in Table XII. Subjects NOR9, NOR12, and MY011 were correctly classified by all the models (9/9). Subjects NOR10 and NOR11 were classified as NOR by eight of the nine models, with one model classifying them as MYO. Similarly, subject MND9 was identified as MND b y eight, and as NOR by one of the nine models. These findings are also expressed in percentage format, see Table XII. A minimum decision level was set at 55%, i.e., when five or more of the nine models were giving the same diagnosis. With this criterion the cases in the evaluation set were assessed and the results are shown in the last column of Table XII. For example, MNDlO was classified as NOR by four models, and classified as MND by five models (CC' = YElS); subject NORlO was classified as NOR by all eight models, and as MY0 by one model (CC = YES); subject MY09 was classified as NOR, MND, and MY0 by three models in each class (CC = NO). ' V. DISCUSSION GBML models supplied with EMG (data can be used for the successful assessment of normals, and patients suffering with MND or myopathy. Important aldvantages of GBML models compared to other models justify mg their further application in clinical EMG include: 1) simplicity in implementing GBML classifier systems; 2) ease of training and instantaneous evaluation; 3) ease in extracting useful rules by studying the classifiers; 4) parallel activation of rules; and 5 ) support reasoning with uncertainty in the sense that the strength of each classifier reflects the previous utility of its inference in the past performance of the system.
Quantitative MUAP analysis was developed by Buchthal in the 1950's, who routindy studied 20 MUAP's per muscle [12]. In this study, selected GBML models derived with MUAP sample size 20 had a high diagnostic yield with TR 2 95 and EV 2 70%. The material under study and the diagnostic performance of the selected GBML mod-els was examined by two expert neurophysiologists. They claimed that "the performance of GBML models was comparable to that of an experienced neurophysiologist." The performance of the GBML models, shown in this study, is comparable to the backpropagation neural network and the self-organizing feature maps algorithms for classifying the same EMG data set. The K-means cluster analysis algorithm was also applied to the same set of data [7]. Performance of this system was very poor, however, with TR = 58% and EV = 50%.
Findings of this study can be compared with the two different automated EMG diagnostic systems developed through the European Community ESPRIT project P599. MUNIN (Muscle and Nerve Inference Network) that employs a causal probabilistic network for interpretation of electromyographic findings was developed by Andreassen and colleagues [5]. Design of the MUNIN system assumed a priori knowledge of diseased pathophysiological models, and also the associated conditional probabilities. The decision process is explained to the clinician by allowing him to inspect the nodes representing diseases, pathophysiology, and test results. In an evaluation study a panel of EMG'ers found this type of explanation to be intuitively appealing. In the same study the system was evaluated using a peer-review methodology. In 11 cases no major discrepancies were found between the consensus opinion of the clinicians and the opinion of the system. KANDID (knowledge-based assistant for neuromuscular disorders diagnosis), a rule-based EMG expert system prototype, was developed by Fuglsang-Frederiksen and his team [4]. WNDID was tested by nine clinical neurophysiologists at seven different EMG labs in Europe [ 161. Neurophysiologists agreed with the KANDID diagnosis in 53% of the 143 cases investigated. The variation from examiner to examiner was 33 to 77% [16].
Models trained with MUAP sample size of 10 and five gave poorer diagnostic performance. Although few of these models resulted in good performance, with TR 2 95 and EV >_ 70%, it is felt that more EMG data are required for evaluating their diagnostic behavior. These findings agree with data published in a recent paper on the effect of MUAP sample size on the diagnostic yield [17]; the presence of myopathy in two of the 10 patients was supported by the analysis of five MUAP's and in nine of the 10 patients with the analysis of 20 MUAP's. AS has already been mentioned, results discussed apply to MUAP sample size = 20.
Diagnostic performance for the selected GBML models was slightly better for the 74-bit data string. For this group, only models with 300 and 500 classifiers satisfied the TR 2 95% and EV 2 70% criterion. Most of the models trained with the 49-bit string that satisfied the above criterion had cl = 200, and 500. Selected models with size = 49 and 74 and cl = 200 and 500, respectively, were trained with lifetax = 0.000 and 0.002. The GA was called periodically by the classifier store to introduce "better" rules into the system. Findings of this study suggest that the GA should not be called upon very frequently, otherwise the system cannot tolerate changes in the classifier's string structure and strength. For the crossover operation, given that TGA is kept constant, models trained with p , = 1.0 gave better diagnostic yield than models trained with p , = 0.5. The mutation operation was kept at low rates compared to p,.
Learning performance for the selected GBML models investigated was better for the 74-bit data string. In addition, when the value of lifetax was 0.002 the leaming process was accelerated. Furthermore, for a given period of GA introduced and a number of classifiers, variations in the crossover and mutation probabilities cause limited effect on the learning performance. Strength distribution histograms portray the status of the classifiers at the end of training. From the limited number of GBML models investigated, it is observed that in 49-bit models with 200 classifiers, more classifiers survived when compared to the 74-bit models with 500 classifiers.
GBML models, backpropagation neural networks, and the self-organizing feature maps, achieved similar diagnostic performance, both for the training and evaluation sets [7], [SI.
Total computation time required for training, however, was reduced for GBML learning as compared with the backpropagation and the self-organizing feature maps paradigms. Analysis of the computation time for one epoch shows that the self-organizing algorithm is the most efficient, whereas the GBML training requires fewer epochs. For the backpropagation learning, although small architectures require less computation time, the number of epochs required for the training is considerably greater, with the opposite being true for large architectures. Furthermore, these findings agree with a recent study where the diagnostic performance of GBML and backpropagation models produced similar results in three medical domains [ 181.
In practice, it is suggested that a hybrid diagnostic system could be built incorporating both GBML and neural network models. With this scheme, the decision is expressed as a string giving the number of models that classified the subject under investigation in the diseases recognized by the system. The ability to combine diagnostic yields from different models expressed in an overall score enhances the usefulness of computer-aided diagnosis in the clinical context. The system could also give the option to the physician for deciding whether a neural-network or GBML opinion is required. Finally, the hybrid EMG system could be combined with the following computer aided modules: patient's clinical assessments muscle biopsy study, biochemical findings, and genetic and molecular genetic findings. This would provide an integrated approach for the diagnosis of patients suffering with neuromuscular disorders [19], [20].

VI. CONCLUSION
The usefulness of GBML has been assessed when applied to the classification of EMG findings. It provides an additional element in the hybrid EMG diagnostic system. The various GBML parameters were studied and their significance in building learning models was determined. The findings verified some theoretical aspects of GBML and helped in developing empirical rules for building successful EMG models. Work in progress includes the use of advanced operators and techniques in genetic search, also the development of a multidisciplinary approach that uses both clinical and laboratory findings for the diagnosis of patients suffering with neuromuscular disorders.