A novel approach for fuzzy clustering based on neutrosophic association matrix

This paper proposes a fuzzy clustering algorithm through neutrosophic association matrix. In the first step, data are fuzzified into neutrosophic sets to create neutrosophic association matrix. By deriving a finite sequence of neutrosophic association matrices, the neutrosophic equivalence matrix is generated. Finally, the lambda-cutting is performed over the neutrosophic equivalence matrix to derive the final lambda-cutting matrix which is used to determine the clusters. Experimental results on several benchmark datasets using different clustering criteria show the advantage of the proposed clustering over the existing algorithms.


Introduction
In practice, data are often uncertain, inconsistency and uncompleted.To handle this problem, fuzzy set was proposed by Zadeh (1965) in which uncertainty is modeled as an elemental dependence of a set.Fuzzy sets have showed meaningful applications in many fields of study (Nguyen, Son, Ashour, & Dey, 2018;Ye & Du, 2017).One of the most essential utilization regarding the fuzzy set is the representation of information such as "non-membership" and "hesitancy".For example, when diagnosing a patient, the doctor often concludes the patient's illness rate corresponds to the disease rather than indicating a complete or unspecified illness.There are several extensions of traditional fuzzy set have been proposed such as intuitionistic fuzzy sets (Atanassov, 1986) and neutrosophic fuzzy set (Smarandache, 1998).Neutrosophic set is the generalization of fuzzy set, intuitionistic fuzzy set and others.Neutrosophic set has been studied and applied in various fields such as the medical diagnosis (Mondaland and Pramanik, 2015), decision support systems (Pramanik and Chackrabarti, 2013), robots (Smarandache and Vladareanu, 2014), social and educational information analyzes, etc.
Clustering is an important concept along with fuzzy set theory.Several clustering algorithms based on fuzzy set have been proposed such as: Fuzzy C-Means (FCM) (Bezdek, Ehrlich, & Full, 1984), the methods proposed by Ye and Fu (2016), Ye and Fu (2016), Ye and Smarandache (2016), Ye and Zhang (2014), Ye (2014Ye ( , 2016Ye ( , 2017Ye ( , 2018)).Recently, neutrosophic association matrix usually is utilized as a tool in many fuzzy clustering algorithms.For the fuzzy clustering algorithm based on neutrosophic association matrix, the most important step is to evaluating the similarities in order to divide the elements into clusters.Ye and Smarandache (2016) proposed three types of measures including Jaccard, Dice and Cosine which then be used in multi-criteria decision making with simple neutrosophic dataset.In Ye (2014) and Ye and Zhang (2014); Ye continued to propose new neutrosophic fuzzy modification methods for decision-makers by combining above similar measures.On the other hand, Ma, Wang, Wang, and Wu (2015) investigate the similar measures of tangential function for medical applications.Other studies on neutrosophicfuzzy clustering algorithms can be found in Kuo, Potti, and Zulvia (2018), Wu, Wu, Zhou, Chen, and Guan (2017), Ye and Fu (2016), Ye andZhang (2014), Ye (2016).
This article proposes a new fuzzy clustering using neutrosophic association matrix.The first step of the algorithm is to construct a neutrosophic association matrix from the data in the dataset.After that, a neutrosophic equivalent matrix is constructed from neutrosophic association matrix.Finally, the lambda-cutting matrix is built based on neutrosophic equivalent matrix by the lambda-cutting step.The result clusters are defined based on the lambda-cutting matrix.
Section 2 presents some background information and proposes a new neutrosophic clustering method though detailed analysis.Section 3 shows the experimental result of proposed algorithm in comparison with other relevant methods on real data sets.Conclusions are in the Section 4.

Background of neutrosophic set
Let 0 be a infinitesimal number (Smarandache, 1998), i.e., for all positive integers one has < n , where "1" and " " are its standard and non-standard parts respectively.Similarly, = (0 ) 0 , and + ]0 , 1 [ is a non-standard unit interval.A neutrosophic set Ain the universe X is characterized by a truth, indeterminacy, and falsehood membership functions < T (x) be two neutrosophic sets.We recall some base relationship between neutrosophic sets (Smarandache, 1998):

Construction of neutrosophic association matrices
Denote N X ( ) by the set of all neutrosophic set., ).
From this definition, we proposed the following notions and theorems which will be used in the main clustering algorithm later.(1) ij n n is an association matrix then M 2 is also an association matrix.

Proof.
(a) For any 1, 2, , , we get ij n n is an association matrix then for a positive integer p, is also an association matrix.an equivalent association matrix.
ij n n be an association matrix.After finite times of compositions: there exist p: 1) , and M p 2 is an equivalent association matrix.

Clustering algorithm based on association matrices of neutrosophic sets
Step 1: be a universe of discourse, and is the degree of uncertainty of y l to B .Step 2: Select a neutrosophic sets association measure, such as Eq. ( 10) below.
Let we noting that, by using well-known Cauchy-Schwarz inequalit Step 3: ij n n using Eq. ( 8); otherwise derive an equivalent association matrix M ¯by Eq. ( 7).Construct -cutting matrix Step 4: If elements of the ith line in M (or M ¯) are the same as those of jth line then B i and B j are of the same type.By this principle, we can classify all these neutrosophic set = B j n ( 1, 2, , ). j These steps of this clustering algorithm can be seen in the following Fig. 1.
By using the cutting matrix of the equivalent association matrix, the new algorithm classifies neutrosophic sets according to a given confidence level which is specified by elements of equivalent association matrices and actual situations.

Experimental environments
The proposed algorithm has been implemented in addition to the methods of Ye (2014), Ye (2016) and Huang (2016) in Matlab 2015a programming language with a PC with CPU Intel(R) Core (TM) i5-2520 M@2.4 GHz, 4096 MB RAM, windows 7 Professional 64 bits.
In order to perform the evaluation, two kinds of datasets have been used.The first dataset is the set of EPPO standard dataset which is taken from EPPO Global Database.It provides a large dataset for variety types as agriculture, forestry and plan protection.Other 10 benchmark datasets (Machine, Ecoli, Pima-indians-diabetes, Student, Transfusion, Voting-records, Climate Model, Adult, Breast-cancer-wisconsin, Seed) have been taken from UCI dataset (UCI Machine Learning Datasets) (see Tables 1 and 2).
Experimental objectives: The quality of all clustering algorithms is evaluated by 3 indices namely DB, SSWC, IVF, VRC and BH.
(a) Davies-Bouldin (DB) (Davies and Bouldin, 1979): Let x i be an "n"-dimensional feature vector assigned to cluster C i and x ¯i is the centroid of C i .Denote d ¯l, d ¯m by the average distances of clusters C m and C , l respectively and d m l , is the distance between them.
If k is the number of clusters, then DB is called the Davies-Bouldin index with The lower value of DB criterion is better.

(b) Simplified Silhouete Width Criterion (SSWC):
Supposed that x j is the point of cluster A and a p j , is the average distance of x j to points in A, while b p j , is the minimum average distance from x j to all other clusters.Then the silhouette of  where Here, X k is the element belonging to cluster k th and V k is the centroid of this cluster.
The maximal value of IFV indicates the better performance.The Calinski-Harabasz criterion is called the variance ratio criterion (VRC).VRC is defined as where SS B , SS W are the overall between-cluster and within-cluster variance respectively, k and N are the number of clusters and ob- servations.SS B is defined as where k is the number of clusters, m i is the centroid of cluster i, m is the overall mean of the data, and m m || || i is the L norm (Euclidean distance) between the two vectors.SS W is defined as where x is a data point, c i is the ith cluster, m i is the centroid of cluster i and x m || || i is Euclidean distance between the two vectors.The maximal value of VRC show the better performance.
(e) Ball-Hall criterion (BH) (Atanassov, 1986): The Ball-Hall criterion (BH) is the mean, through all the clusters, of their mean dispersion: where n i is the number of observations in the i-th cluster, and u ij is the membership degree of x i in the i-th cluster.
The maximal value of BH the better performance.3 and 4 present some comparative results of proposed method with existing works on EPPO and UCI dataset.

The comparison of performance
Tables 5 and 6 present the clustering time of algorithms for 3 datasets of EPPO and UCI.
On the performance graph, the clustering results of the proposed algorithm are more obvious and less noise-intensive than those of the existing methods.The graphs show that the clustering results are nearest-neighbor groups will have the same color.It is clear that our algorithm genereates obvious clusters in the dataset compared to the other algorithms.Besides, it has less noise-intensive elements which are

Pima-indians-diabetes Student Transfusion
Voting-records

Climate Model Adult
Breast-cancer-wisconsin Seed

Conclusions
This paper proposed a new fuzzy clustering algorithm based on association matrix using the neutrosophic set.After constructing a neutrosophic association matrix from the data, a neutrosophic equivalent matrix is designed based on association matrix.The next step is to construct the lambda-cutting matrix based on neutrosophic equivalent matrix by the lambda-cutting step.Finally, the clusters are defined on the basis of lambda-cutting matrix.To assess the quality of clusters,

Pima-indians-diabetes Student Transfusion
Voting-records

Climate Model Adult
Breast-cancer-wisconsin Seed The experimental results on the EPPO and UCI datasets show that the quality of the proposed algorithm is better than the comparative clustering algorithms.The clustering results are also well distributed and noises and exceptions.However, the runtime of our algorithm is usually longer than other algorithms.Therefore, in the future, we will study the improvement of the runtime of the fuzzy clustering algorithm on the neutrosophic fuzzy sets.

Figs. 2 Fig. 4 .
Figs.2-4show the result of clustering algorithms where a color represents a cluster.The number of clusters depends on each method and their configurations parameters.Here we choose the number of clusters in each algorithm by approximating each other.Through each figure and its sub-figure it is possible to see that the proposed algorithm expresses clusters more clearly than other algorithms.Figs.5-8 show the result of clustering algorithms for each UCI

Table 1
The descriptions of experimental EPPO datasets.

Table 2
The descriptions of experimental UCI datasets.

Table 3
Comparative result of proposed method with existing works on EPPO dataset (Bold shows the best results in a column).

Table 4
Comparative result of proposed method with existing works on UCI dataset (Bold shows the best results in a column).

Table 5
Comparison of runtime (seconds) between 3 algorithms on EPPO dataset.

Table 6
Comparision of runtime (seconds) between 3 algorithms on UCI dataset.