On the Cluster Analysis Matlab Clustering Program Design in Matlab

Matlab provides a series of functions for cluster analysis, summed up the specific methods are as follows:

Method a: Direct Cluster, clusterdata function using a sample data cluster, which is a disadvantage for the user to select a narrower surface, the distance calculation method can not be changed, the user need not understand the method of clustering principle and process , but the clustering effect is limited.

Method 2: Hierarchical clustering, this method is more flexible and requires detailed understanding of the clustering principle. The specific process needs to be handled as follows: ( 1 ) Find the similarity and dissimilarity between the variables in the data set and use the pdist function. Calculate the distance between variables; ( 2 ) Define the connection between variables using the linkage function; ( 3 ) Use the copheneTIc function to evaluate clustering information; ( 4 ) Create clusters using the cluster function.

Method 3: Classifying clusters, including K-means clustering and K-center clustering, also requires a series of steps to complete the process, requiring users to have a clearer understanding of the clustering principles and processes.

Next, introduce related functions and related clustering methods in Matlab.

1 . Introduction to related functions in Matlab

1.1 pdist function

Call format: Y=pdist(X,'metric')

Description: Calculates the distance between objects in the X data matrix using the method specified by 'metric' . '

X : A matrix of m &Times; n , which is a data set of m objects, each of size n .

Metric' takes the following values:

'euclidean' : Euclidean distance (default); 'seuclidean' : normalized Euclidean distance;

'mahalanobis' : Mahalanobis distance; 'cityblock' : Bullock distance;

'minkowski' : Minkowski distance; 'cosine' :

'correlaTIon' : 'hamming' :

'jaccard' : 'chebychev' : Chebychev distance.

1.2 squareform function

Call format: Z=squareform(Y,..)

Explanation: Forces the distance matrix to be converted from an upper triangle to a square matrix, or from a square matrix to an upper triangular one.

1.3 linkage function

Call format: Z=linkage(Y,'method')

Say Ming: Calculate the system cluster tree using the algorithm specified by the ' method ' parameter.

Y : distance vector returned by the pdist function;

Method : Possible values ​​are as follows:

'single' : shortest distance method (default); 'complete' : longest distance method;

' average ': Unweighted average distance method; ' weighted ': weighted average method;

'centroid' : centroid distance method; 'median' : weighted centroid distance method;

'ward' : Inner Square Distance Method (Minimum Variance Algorithm)

Returns: Z is a matrix containing ( m-1 ) & TImes; 3 information of the cluster tree.

1.4 Dendrogram Function

Call format: [H , T , ...]=dendrogram(Z,p , ...)

Description: Generates an icicle diagram (pedigree map) with only the top p nodes.

1.5 cophenet function

Call format: c=cophenetic(Z,Y)

Remarks: Calculate the cophenet correlation coefficient for the Z generated by the Y and linkage functions generated by the pdist function.

1.6 cluster function

Call format: T=cluster(Z,...)

Description: Creates a classification based on the output Z of the linkage function.

1.7 clusterdata function

Call format: T=clusterdata(X,...)

Description: Create a classification based on the data.

T=clusterdata(X,cutoff) is equivalent to the following set of commands:

Y=pdist(X,'euclid');

Z=linkage(Y,'single');

T=cluster(Z,cutoff);

Talking about the Cluster Analysis in Matlab and the Design of Matlab Clustering Program

2. The design of the clustering program of Matlab

2.1 Method One: A Clustering Method

X=[11978 12.5 93.5 31908;...;57500 67.6 238.0 15900];

T=clusterdata(X,0.9)

2.2 Method Two and Method Three Design Flow: Step-by-Step Clustering

Step1 Finding similarities between variables

Using the pdist function to compute similarity matrices, there are several ways to calculate distances. It is best to normalize the data using the zscore function before performing calculations.

X2=zscore(X); % Normalized data

Y2=pdist(X2); % calculation distance

Step2 defines the connection between variables

Z2=linkage(Y2);

Step3 evaluate clustering information

C2=cophenet(Z2,Y2); //0.94698

Step4 Create Clusters and Make Pedigree Diagrams

Fiber Pen Nib

Fiber Pen Nib,Passive Capacitive Stylus Pen,Rubber Tip Stylus Pen,Microsoft Stylus Pen

Shenzhen Ruidian Technology CO., Ltd , https://www.szwisonen.com

Posted on