Automatic Clustering of Gene Ontology by Genetic Algorithm

— Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm.


I. INTRODUCTION
HE Gene Ontology (GO) [1] is an effort done by The Gene Ontology Consortium (www.geneontology.org) to define consistent terminology that describe the attributes of biological process, cellular component, and molecular function of a gene product.The intention of GO is to share common understanding of the meaning of any term used, and therefore could support the database query tool to find functionally equivalent terms in cross-database search.In essence, this will improve retrieval consistency across resources and the recall and precision of the query result within resources.
In conjunction with rapid progress in bioinformatics field, an increasing number of terms are being generated in the GO, see Fig. 1.This is due to the attempt to standardize as many terms as possible in different repositories for plant, animal, and microbial genomes such as The Arabidopsis Information Resource (TAIR)-database for the brassica family plant Arabidopsis thaliana, Rat Genome Database (RGD)-database for the rat Rattus norvegicus, and GeneDB protozoadatabases for Plasmodium falciparum, Leishmania major, Trypanosoma brucei, and several other protozoan parasites.At this time, the GO contains about 20,069 terms and 29,102 relationships between the terms (as of November 5, 2005).These terms are associated with 1.65 million gene products, 0.23 million amino acid sequences, and 0.25 million species.The high dimension of the GO instances and its monolithic character has caused its maintenance and processing more difficult and challenging.
Therefore, in this study, a hybrid approach consisting of the genetic algorithm and split-and-merge algorithm is applied to automatically cluster the GO terms into smaller and highly intra-related clusters.The hybrid genetic algorithm used software engineering measurements, the cohesion-andcoupling metric, to quantify the quality of clustering (QOC), see ( 6)- (9).The idea of using these metrics are to produce good clusters by maximizing the degree of interaction between terms in a cluster (high cohesion) and also minimizing the degree of interaction between terms in different clusters (low coupling).The genetic algorithm is chosen due to its efficient navigation through large search space and good performance as stochastic search procedure.It is used to generate potential clusters by applying standard crossover and mutation operator, together with enforcing the cohesion-and-coupling metric into fitness function.Then, the split-and-merge algorithm is implemented to efficiently estimate the number k of clusters.Learning the k is achieved by the split-and-merge algorithm based on the cohesion-andcoupling metric by improving any infeasible clusters.Furthermore, parallelization of genetic algorithm based on coarse-grained (island) model [2] is considered to reduce time complexity.
Recently, there has been an increasing awareness of the benefits of the GO RDF/XML for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the GO for gene clustering.But the size and massive nature of the GO RDF/XML cause problems that affect maintaining, publishing, validating, and processing the GO instances.This is due to the fact that the ontology as a whole is too large to handle.Therefore, the purpose of this study is to partition the GO RDF/XML into a set of more accessible and understandable modules.By modularizing this single monolithic file into smaller files will enable amino acid sequences and IEA (Inferred from Electronic Annotation) evidence associations to be included into the GO RDF/XML.With these additions, it would complete and cohere the GO RDF/XML file.Thus, the GO RDF/XML will be more processable and exchangeable by software agent or other machine-readable meta-data.
This paper is arranged as follows.The second section begins with the problem description of clustering the GO terms.The third section discusses related work in the clustering area.The fourth section explains the flow of the proposed genetic algorithm.The fifth section details the splitand-merge algorithm for discovering an appropriate k.The sixth section describes the parallelization process of the hybrid genetic algorithm.The seventh section presents the experimental results of clustering the GO terms and the modularized semantic web of the GO RDF/XML format.Some discussions and the conclusion of the paper are included in the final section.

II. STATEMENT OF THE PROBLEM
Automatic clustering is a process of dividing a set of objects into unknown groups, where the best number k of groups is determined by the clustering algorithm.That is, objects within each group should be highly similar to each other than to objects in any other group.Finding the k automatically is a hard algorithmic problem.The automatic clustering problem can be defined as follows: Let X = {X 1 , X 2 , …, X n } be a set of n objects.These objects are clustered into non-overlapping groups In the GO context, the GO terms are structured as Directed Acyclic Graph (DAG).Let GO graph G = {V, E}, where V is a set of nodes that represent the GO terms and E is a set of directed edges that represent relationships between the GO terms.Clustering the GO graph can be considered as a Graph Partitioning Problem (GPP).The aim of GPP is to cut a vertex set V into k disjoint and non-empty subsets such that the number of edges connecting nodes in different subsets is minimized and the number of edges connecting the nodes in the same subsets is maximized.GPP is a fundamental combinatorial optimization problem that has numerous practical applications in many areas including design of VLSI circuits [3], mesh partitioning in parallel processing [4], image segmentation in computer vision [5], and gene expression analysis in bioinformatics [6].
To partition the GO graph, the following questions need to be answered: 1) What is the most suitable clustering algorithm to find the optimal solution of the GPP, and that offers reasonable amount of execution time to this NP-complete problem? 2) What is the precise criterion for discovering the number k of clusters and for measuring the goodness of the clusters?In this paper, the first question is answered by aggregating split-and-merge algorithm, which consists of two steps, into the parallel genetic algorithm.At first, the entire node is decomposed into a number of clusters using the split algorithm.These clusters are then automatically combined using the merge algorithm in several iterations until the suitable number k of clusters is obtained.On the other hand, the cohesion-and-coupling metric is used to answer the second question.

III. RELATED WORK
The clustering problem is omnipresent in many fields of science and engineering.It has been solved by various techniques such as k-means [7], genetic algorithm [8], selforganizing map [9], fuzzy c-means [10], and particle swarm optimization [11].Survey of clustering techniques can be found in [12]- [14].Recently, the increasing amount of data has made the number k of clusters difficult to guess, and the value supplied by the user based on prior knowledge, presumptions, and practical experiences is often inaccurate.Therefore, reasonable ways of identifying the number k of clusters automatically is required to avoid trial-and-error work.Lately, several techniques have been proposed to determine the number k of clusters.Most of the techniques are wrapped around k-means or genetic algorithm.Split and/or merge rules are the most famous wrapper methods to increase or decrease the number k of clusters while the algorithm continues.Among these techniques are: 1) X-means [15]; in this the splitting decision is performed 2) G-means [16]; it starts with small number of k-means centers and raises the number of centers using Gaussian distribution.3) CLUSTERING [17]; it is an automatic clustering based on heuristic strategy that uses the nearest neighbor to group those data that are situated close to one and another.Then, genetic algorithm is used to group the smaller clusters into larger ones.4) Genetic Clustering Algorithm (GCA) [18]; it is basically composed of two steps.First, the data set is divided into a number of clusters using Cluster Decomposition Algorithm (DCA) and at the second step, Hierarchical Cluster Merging Algorithm (HCMA) is used to combine the clusters automatically.5) S+G [19]; it is also a two stage method, which in the beginning uses a self-organizing feature map to determine the number k of clusters and then employs a genetic algorithm based clustering to find the final solution.In the case of the GPP, an extensive study of Kerninghan-Lin algorithm, simulated annealing, tabu search, watermarking, and normalized cut have been carried out by [20]- [23], [5] respectively.Review of the GPP techniques can be found in [24], [25].Several studies using genetic algorithm for the GPP have also been done by: 1) Bui and Moon [26] introduced a schema of preprocessing phase before the initialization of population to ameliorate the quality of the chromosome.The different classes of graphs: random graph, random geometric graph, random regular graph, and caterpillar graph consisting of 134 to 5,252 nodes, were tested with the algorithm.2) Kaveh and Bondarabady [27] implemented genetic algorithm for finite element decomposition of 1,640 to 6,720 elements.Sequences of coarsening and uncoarsening process are performed to transform the large scale graph G 0 into a smaller size graph G n and vice versa such that a suitable size of graph can be partitioned by genetic algorithm.3) Kohmoto et al. [28] has incorporated simulated annealing into genetic algorithm to generate feasible solutions.The algorithm is then applied to undirected graph with 124 to 250 nodes.For the ontology clustering or semantic web modularization, very little effort has been done in this area.Stuckenschmidt and Klein [29] have proposed a method for automatic partitioning of large ontologies based on the structure of the class hierarchy.The method consists of three steps: 1) In the first step, a dependency graph is created from ontology source file using PROLOG-based tool that reads OWL and RDF schema files.It then displays the dependency graph using networks analysis tool Pajek.2) In the second step, the strength of the dependencies between the concepts in the dependency graph is determined by computing the propositional strength network.3) In the third step, an island algorithm is used to determine the modules existing in the dependency graph.

IV. PROPOSED HYBRID GENETIC ALGORITHM
The hybrid genetic algorithm can be initialized with k min minimum number of clusters that needs to be provided by the user and a DAG graph with i number of nodes and j number of directed edges, where i, j, k min ∈ {1, 2, …, n}.Ab initio, the algorithm starts with initializing few parameters, such as number of generations t max , size of population ps, crossover probability p c , and mutation probability p m which can be modified by the user.The subsequent steps in the algorithm can be described as follows: 1) Set iteration t = 0. Encode the DAG G = {V, E} using a cluster-number (see discussion on the chromosome representation) schema and generate the initial chromosomes 0 1 x … 0 ps x of population P(0) randomly where the value of genes are between [1…k].Then, evaluate the fitness for each chromosome x 0 ∈ P(0) using the fitness function f(x 0 ) based on the cohesion-andcoupling metric (see discussion on the fitness function).2) If t > t max , then terminate the process, decode the best chromosome x max ∈ P, and display the clustering C.
Otherwise, go to step 3. 3) Increment t = t + 1. Create a new population by selecting good chromosomes from old population (iteration t -1).

4) Perform crossover between two chromosomes t a
x , t b x ∈ P(t) with probability p c and then mutate each gene in a single chromosome x t ∈ P(t) with probability p m .5) Perform split function S(x t ) to increase the k and then decrease the k using merge function M(x t ) for each chromosome x t ∈ P(t) such that cohesion score α is maximized and coupling score β is minimized (see discussion on the split-and-merge algorithm).6) Evaluate the fitness for each chromosome x t ∈ P(t) using the fitness function f(x t ) and go to step 2.

A. Chromosome Representation
A good chromosome representation is crucial to the convergence velocity of the hybrid genetic algorithm and the quality of the solution obtained.Therefore, the cluster-number scheme is used to ensure that the gene values can be simply assigned and interpreted even for large graphs.In addition, it makes it more possible to relate each chromosome to a solution for the GPP.The cluster-number scheme represents a clustering of n objects as an array of n integers where the value at ith subscript denotes the cluster number which holds the ith object.
To partition the DAG graph, the graph is represented by a single chromosome using 1D array of integers as follows: 1) Genes are integer values that represent the cluster number that each particular node belongs to.2) Loci are mapped to the node number.Edges between nodes are input to the algorithm as a n × 2 matrix, with n rows corresponding to number of edges and 2 columns associated with a pair of nodes.Fig. 3 shows a chromosome representation of the graph G 1 (see Fig. 2) with 12 nodes and 3 clusters.

B. Reproduction
During the reproduction phase, two classical and most often-used genetic operators are employed, i.e., the crossover and the mutation operators.These operators are chosen due to their effectiveness with the 1D array of integers representing a chromosome and the cohesion-and-coupling metric based fitness function.The crossover operator creates new offspring by combining features of their parents.In the meantime, the mutation operator arbitrarily alters one or more genes produced from the crossover process.The reason for using these operators in the hybrid genetic algorithm is to generate new population with higher total fitness in each generation.
Although such operators are effective, the resulting solutions do not guarantee feasibility.In order to increase the feasibility and optimality of the solution, the offsprings go through alteration process by the split function S(x) and then the merge function M(x) after every reproduction by the genetic operators.The transformation is based on a cluster-bycluster basis by making modification in a single chromosome (S(x), M(x) : x → x′), which is then evaluated by the fitness function f(x′).Even though the purpose of these functions are to determine the best number k of clusters, indirectly the solutions will be improved and be repaired by shifting to a better neighbor solution until no improvement can be made.The split function S(x) and the merge function M(x) are discussed elaborately in the split-and-merge algorithm section.

C. Fitness Function
The optimization of the GPP can be stated as optimizing a function f that partitions the graph where k is the best value which generates highly cohesive clusters.On the dot, the main objective of partitioning the DAG graph is to find feasible and nearoptimal solution that maximizes the preference for cohesion between nodes in a cluster and minimizes the preference for coupling between different clusters.
The cohesion α i of the cluster i of the DAG graph can be calculated by: ( ) where N i is the number of nodes in the cluster i and µ i is the number of its internal edges.
The coupling β i,j between clusters i and j is given by: where N i and N j are number of nodes in the clusters i and j respectively and ε ij is the number of edges from cluster i to cluster j.
The initial fitness function f 0 (x) of the DAG graph partitioning is measured by constituting a trade-off between cohesion score α and coupling score β.This trade-off is computed by subtracting the average cohesion from the average coupling.The initial fitness function f 0 (x) is given as follows: The values of the initial fitness function f 0 (x) vary between [-1…1].A good quality cluster has a high value of f 0 (x).However, to ensure the algorithm obtains a balanced clustering, standard deviation of dependency index stdev(γ) is considered, see (5).Therefore, a feasible and near-optimal solution is searched by maximizing the result of subtracting the standard deviation of the dependency index stdev(γ) from the initial fitness function f 0 (x):

V. THE SPLIT-AND-MERGE ALGORITHM
By the embedment of the split-and-merge algorithm into the genetic algorithm, the k value which is held by each gene in the chromosomes will be refined and fixed.Through this method, chromosomes with best number k of clusters and high fitness are reproduced in each generation.Hence, it eliminates the process of producing solutions with unsuitable number k of clusters and accelerate the pace for convergence.The detailed steps of these algorithms are shown in Fig. 4, Fig. 5, and Fig. 6.After undergoing the repairing process, any illegal chromosome will be adjusted and then be evaluated by the fitness function f(x).The illegal chromosome represents a partition in which some clusters are empty.For example, given k = 3, the chromosome x = (1 1 3 1 3 3) is illegal because cluster number two is empty.
Definition 1. Legal and Illegal Chromosome.Given a chromosome x = g 1 , g 2 , …, g n , let e(x) be the number of nonempty clusters in x divided by k, e(x) is called legality ratio.The chromosome x is legal if e(x) = 1 and illegal otherwise.
Unfortunately, in some cases the repairing process can cause clusters to further split or merge due to strong internal dependencies.This phenomenon creates unbalanced subgraphs and reflects the aim of creating modular ontology.Therefore, dependency index γ is introduced to stabilize the split-and-merge algorithm and to forbid it from producing micro or giant clusters during splitting or merging process.The dependency index γ i of the cluster i is given by: The target value for dependency index γ i of a cluster i is 0. for q := 1 to x split .Length() do if x split .Gene(q) > p then x split .Gene(q) := x split .Gene(q) + 1; end-if end-for for q := 1 to s do xq := xsplit; for r := 1 to x q .Length() do if x q .Gene(r) = p then x q .Gene(r) := Random(p, p + 1); end-if end-for if x q .QOC(C p , C p+1 ) > x.QOC(C p ) and x q .DependencyIndex(C p ) > Imin and xq.DependencyIndex(Cp+1) > Imin then x := x q ; p := p + 1; end-if end-for end-for end

A. Dividing of Clusters with Split Algorithm
The main objective of the split function S(x) is to decompose each cluster in chromosome x into reasonable fragmented clusters.Detailed split function S(x) is shown in Fig. 5.This function works by creating clone chromosomes 1 c x … c n x from the chromosome x ∈ P(t).For each cluster C 1 …C p in the clone chromosome x c , divide the cluster C p into two clusters C p and C p+1 .The chromosome x will be replaced by the best clone chromosome x c that satisfies the following criteria: 1) The QOC of the clusters C p and C p+1 in the clone chromosome x c is higher than the QOC of the cluster C p in the chromosome x.
2) The dependency index γ of the clusters C p and C p+1 in the clone chromosome x c must be greater than the dependency index threshold for small cluster I min .The QOC of the clusters C p and C p+1 in the clone chromosome x c is computed as follows: The QOC of the cluster C p in the chromosome x is calculated with the following equation:

B. Combining of Clusters with Merge Algorithm
The merge function M(x) is carried out to merge the isolated clusters by repairing genes in the chromosome x when necessary.The goal is to guarantee that all the chromosomes repaired by the split function S(x) are genuinely fit to be feasible and near optimal solution.As shown in Fig. 6, the merge function M(x) is invoked to combine clusters C p and C q in the chromosome x ∈ P(t).If the trial consolidation fulfills the following conditions, then permanently merge clusters C p and C q : 1) The QOC of the merged clusters C p and C q is higher than the QOC of the cluster C p as alone.
2) The dependency index γ of the merged clusters C p and C q must be less than the dependency index threshold for large cluster I max .The QOC of the cluster C p in the chromosome x is computed by (8)  x QOC C k The QOC of the merged clusters C p and C q in the chromosome x is calculated as follows: , , , , , ( , ) 2 2 3 p k q k i j i j p q i pj i qj p q x QOC C C k VI. THE PARALLELIZATION PROCESS When the hybrid genetic algorithm is employed to cluster the GO, it becomes computationally intensive.This is due to the fact that the GO graph have a large number of nodes and many directed edges.In addition, it demands a multitude of chromosomes and many generations of population in order to obtain good solutions.This scenario becomes deteriorated when population for each generation is required to go through the reproduction process on which the crossover, mutation, split, and merge functions, as shown in Fig. 4, are applied.
To resolve this problem, an efficient and affordable parallel hybrid genetic algorithm is developed by exploiting the advantages of island model.It is implemented on a low-cost PC cluster using message passing interface libraries.Island model is used to dissever the single large population into a number of subpopulations in order to allow each subpopulation to evolve their solutions autonomously.This parallelization model is chosen since it permits each subpopulation to be assigned to each processor of the low-cost PC cluster.Therefore, the computation load can be shared among processors, and it indirectly reduces the computation time.Moreover, the inter-processor communication between processors is lessened because the interaction happens when some chromosomes are migrated from one subpopulation to another.The migration process is done by moving a number of emigrants from the source subpopulation to replace the worst chromosomes in the target subpopulation.The emigrant is randomly selected among the best chromosomes in the source subpopulation.The parallelization of the hybrid genetic algorithm can be explained as follows: 1) Set global iteration t = 0. Encode the DAG G = {V, E} and generate the initial population P(0) of random chromosomes 0 1 x … 0 ps x .

VII. COMPUTATIONAL RESULTS
The parallel hybrid genetic algorithm discussed in the previous section has been tested using GO data in MySQL format as released on November 2005 (available online at www.godatabase.org/dev/database/archive/).The algorithm is implemented by enhancing the GAlib C++ libraries [30].The basic information of the GO graph is shown in Table 1.There are 20,069 nodes representing the GO terms and 29,102 directed edges corresponding to the relationships between the terms.
The parameters used to run the parallel hybrid genetic algorithm are shown in Table 2.The computer used is a lowcost PC cluster, HP d530 with 25 processors.Each processor is assigned to one subpopulation consisting of 4 chromosomes.The low-cost PC cluster is implemented using MPICH libraries [31] developed by Argonne National Laboratory under Fedora Core 2 running on Pentium 4 processor 2.8 GHz, 512 MB RAM, and 100 Mbps NIC.
The evolution of the 25 subpopulations is shown in Fig. 7.The stability of the parallel hybrid genetic algorithm can be seen in Table 3 and Fig. 8, where results of 5 separate runs are compared by taking the best individual from the 25 subpopulations in each run.The convergence appeared as early as after 230 generations.The optimal value of the fitness function is in the interval 130.5 × 10 -6 to 135.4 × 10 -6 .The time taken varied from 152.8s × 10 3 to 231.9s × 10 3 .The clustering utilization is depicted in Fig. 9, where the range of the dependency index ℜ(γ l -γ s ) is between 0.005 and 0.008.
To test the consistency of the number of clusters found by the parallel hybrid genetic algorithm, different minimum numbers of clusters k min are given to the algorithm as shown in Table 4.The results show that if the minimum number of clusters k min provided by the user is greater than the best number k of clusters, then the number of clusters found is bound to it.
In order to assess the performance of the parallel hybrid genetic algorithm, its behavior is compared with the parallel standard genetic algorithm.The results are shown in Table 5, where k = 5 is examined.The integration of the split-andmerge algorithm into the genetic algorithm produced higher optimal value and resolved the hard algorithmic problem in estimating the number k of clusters.Due to additional processing requirements to filter the chromosomes in the population in order to find the best number k of clusters, the results in Table 5 show an increase of CPU time for the parallel hybrid genetic algorithm.The clustering utilization between these algorithms can be found in Fig. 10.The results  show that the standard deviation of the dependency index stdv(γ) plays an important part to create a balanced clustering.
Fig. 11 shows an example of GO:0006631 that includes GO:0019752 from the cluster C 0 (line 8) and GO:0044255 from the cluster C 1 (line 9).The figure also depicts the encompassment of amino acid sequence (line 13-27) of IPR006180 from the InterPro database and IEA evidence association (line 30-39) with gene BC4V2_0_00031.The example shows that by modularizing the monolithic GO RDF/XML file, the smaller GO RDF/XML files can be easily maintained and made more thoroughgoing.

VIII. CONCLUSION
The aim of this work is to automatically partition the humongous GO RDF/XML file into smaller files in order to    reduce difficulties in maintaining, publishing, validating, and processing them.This study has shown that clustering the GO can be modeled as the GPP.The model is then solved by a parallel hybrid of the genetic algorithm and the split-andmerge algorithm.The genetic algorithm is used to find a combination of node-cluster and the split-and-merge algorithm is applied to build a feasible clustering and also to automatically search for the most suitable number k of clusters.During the clustering process, the algorithm has employed cohesion-and-coupling metric as criterion to discover the best number k of clusters and to measure the quality of the clusters.The dependency index γ is then introduced to prevent the algorithm from producing problematic clusters with either undersized or oversized number of nodes.Since clustering the GO involves large graph and demands high computing resources, the island model is incorporated into the algorithm.The message passing interface libraries are used as the parallel programming interface and the algorithm is executed on a low-cost PC cluster.
Unlike any other graph partitioning algorithms, the proposed algorithm with the split-and-merge strategy can automatically find the appropriate number k of clusters.Moreover, compared to other automatic clustering algorithms, the proposed algorithm is capable of generating balanced subgraphs and does not rely on distance calculations to measure the strength between cluster centroid to each object.Furthermore, users are allowed to set the minimum number of clusters they wish to maintain and supply the dependency index threshold in order to control the size of the clusters.In fact, the algorithm does not require modifications to the components of the genetic algorithm and the split-and-merge algorithm procedures.Thus, its design is generic and domain independent.Consequently, the algorithm can be fitted to different sorts of problems with minimum changes as long as the problems can be modeled as the GPP.
The experimental results show that the algorithm is effective, stable, and thus, it requires reasonable amount of execution time.In fact, the parallelization process can be implemented with minimum hardware specifications.The proposed algorithm is also capable of finding near-optimal solution among the feasible solutions.Possible directions for further research would be on including the functional interactions between GO terms and on developing a component-based GO.At present, we may be able to get more meaningful and reusable clusters.In future, the clustering results will be applied for predicting protein functions according to the GO information.Moreover, the research will be continued on developing techniques for retrieval and classification of the GO.

World
Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering Vol:1, No:7, 2007 2088 International Scholarly and Scientific Research & Innovation 1(7) 2007 ISNI:0000000091950263 Open Science Index, Computer and Information Engineering Vol:1, No:7, 2007 publications.waset.org/7125/pdf3) Chromosome length is the number of nodes in the graph.

Fig. 10
Fig. 8 Evolution of 5 runs as shown below: 5) If local t > t max , then terminate the process on the processor Proc n , decode the best chromosome x max ∈ SP n , and display the clustering C of the subpopulation SP n .Otherwise, go to step 6. 6) Increment local t = t + 1. Create a new subpopulation SP n (t) with probability p c and then mutate each gene in a single chromosome x t ∈ SP n (t) with probability p m .8) Perform split function S(x t ) to increase the k and then decrease the k using merge function M(x t ) for each chromosome x t ∈ SP n (t).9) Compute the fitness for each chromosome x t ∈ SP n (t) by applying the fitness function f(x t ).10) If t = t M , where t M is an isolation time to perform the migration operation at every M generation, then select a target subpopulation SP target and replace i number of worst chromosomes in the target subpopulation a 1 …a i with j number of best chromosomes from this subpopulation b 1 …b j , where a is x min ∈ SP target , b is max t x ∈ SP n (t), and i = j.11) Proceed to step 5.