HOME GUIDE OPERATIONS DOCS ERRORS FORMATS INSTALL NEW TIPS WEB SITES

CL KM - CLassification - K Means clustering

(8/25/97)

PURPOSE

Performs K-means clustering on factors (or raw data) produced by CORAN or PCA.

SEE ALSO

CL CLA [CLassification - Clusters]
CL HC [CLassification - Hierarchical clustering]
CL HD [CLassification - Hierarchical clustering, calculate classes]
CL HE [CLassification - Hierarchical clustering, create doc files]

USAGE

.OPERATION: CL KM,X21,X22,X23,X24,X25

.ENTER DATA FILENAME TYPE (SEQ(1), IMC(2) or PIX(3)): 2
[Enter an integer 1, 2, or 3 to select the file type (without the coordinate code or extension) of the file containing your data. it can be either the sequential file (SEQ), the image file (IMC) or the pixel file (PIX). These files were created by CORAN or PCA using CA SI and/or CA S.

.ENTER COORDINATES FILE CODE: 101
[Enter the code number of the file chosen above.]

.NUMBER OF CLASSES: 50
[Enter number of classes required.]

.FACTOR NUMBERS: 1,3,4,6
[Enter the factors to be included in the K-means clustering algorithm.]

.FACTOR WEIGHT: 1.5
[Enter weights for each selected factor. If a weight of zero is given at any point, all the weights from the corresponding factor onwards are set to one.
.FACTOR WEIGHT: 1.0

.FACTOR WEIGHT: 1.0

.FACTOR WEIGHT: 0
This question is repeated as many times as the number of factors specified, or is terminated by entering zero.]

.For random seeds give non-zero starting number: 1457
[Initial partition of objects is random. If the answer is zero, the partition is as follows: 1st object to first class, 2nd object to second class, ..., k-th object to k-th class, (k+1)-th object to first class, etc. For non-zero answer, the number is used to initialize a truly random assignment of objects. The purpose is to try different initial partitions for a given number of classes and choose the one with the best value of one of the criteria.]

.TEMPLATE FILENAME (ex: SEL***): SEL***
[Enter the name (without a coordinate code or extension) of the files where all the objects belonging to the same cluster will be stored.]

.DOCUMENT FILE: MAP001
[Enter the document file name where the cluster membership for each object will be stored.]

NOTES

  1. Registers X21-X25 store values for the following criteria:
    X21: Tr(B), trace of between-groups sum of squares matrix,
    X22: Tr(W), trace of within-groups sum of squares matrix,
    X23: C = Tr(B)*Tr(W), Coleman criterion,
    X24: H = (Tr(B)/(k-1))/(Tr(W)/(nobj-k)), Harabasz criterion (k = number of groups; nobj = number of objects [images]),
    X25: DB, Davies-Bouldin criterion.
    The local maximum on the plot of C or H versus number of groups indicates the 'best' partition.
    A large change in value of Tr(W) also indicates a possible good partition.
    The local minimum on the plot of DB versus number of groups indicates the 'best' partition.
    Davies-Bouldin's is the most highly recommended criterion.

  2. For description of the k-means algorithm and clustering criteria see:

    Cluster Analysis Algorithms for Data Reduction and Classification of Objects. Helmuth Spath. (John Wiley & Sons, Ellis Horwood Ltd., 1980).
    Algorithms for Clustering Data. A.K.Jain, R.C.Dubes. (Prentice Hall, 1988).
    A cluster separation measure. IEEE Trans. Pattern Analysis and Machine Intelligence 1:224-227.

SUBROUTINES

SUBKMNS, SUBKMEANS, NEWKMEANS, PRNTXX

CALLER

UTIL1

© Copyright Notice /       Enquiries: spider@wadsworth.org