Clustering Measurements

1. Measures the Cluster Validity
  • Numerical measures that are applied to judge various aspects of cluster validity, are classified into the following three types
    • External Index: Used to measure the extent to which cluster labels match externally supplied class labels
      • entropy
    • Internal Index: Used to measure the goodness of a clustering structure without respect to the external information
      • sum of squared error (SSE)
    • Relative Index: used to compare two different clustrings of clusters
      • often an external or internal index is used for this function, e.g., SSE or entropy
2. Measuring Cluster Validity via Correlation

  • Two matrix
    • Proximity Matrix
    • Incidence Matrix
      • one row and one column for each data point
      • an entry is 1 if the associated pair of points belong to the same cluster, else 0
  • Compute the correlation between the two matrices
    • since the matrices are symmetric, only the correlation between n(n-1)/2 entries needs to be calculated 
  • High correlation indicates that points that belong the same cluster are closed to each other
  • Not a good measure for some density 

Leave a Reply