Hierarchical Clustering

1. Hierarchical Clustering

  • Produces a set of nested clusters
    • organized as a hierarchical tree
  • Can be visualized as a dendrogram
    • a tree like diagram that records the split
2. Strengths of Hierarchical Clustering
  • Do not have to assume any particular number of clusters
    • any desired number of clusters can be obtained by “cutting” the dendogram at the proper level
  • They may correspond to meaning taxonomies
    • example in biological sciences (e.g., animal kingdom, phylogeny reconstruction, …)
3. Two main types of Hierarchical Clustering
  • Agglomerative
    • Start with the points as individual clusters
    • At each step, merge the closest pair of clusters until only one cluster (or k clusters) left. 
  • Divisive
    • Start with one, all-inclusive cluster
    • At each step, split a cluster until each cluster contains a point (or there are k clusters)
4. Inter-Cluster Similarity
  • Min
    • strength: can handle non-elliptical shapes
    • weakness: sensitive to noise and outliers
  • Max
    • strength: less susceptible to noise and outliers
    • weakness: tend to break large clusters; biased towards globular clusters
  • Group Average
    • proximity of two clusters is the average of pairwise similarity between points in the two clsuters
Proximity = frac{sum_{p_i in Cluster_i, P_j in cluster_j}{Proximity(p_i, p_j)}}{|Cluster_i||Cluster_j|}
    • Strongness: less susceptible to noise and outliers
    • Weakness: biased towards globular clusters
  • Distance between centroids
  • Other methods driven by an objective function
    • Ward’s method users squared error
      • Similarity of two clusters is based on the increase in squared error when two clusters are merged
        • Similar to gorup average if distance between points is distance squared.
      • Strongness: less susceptible to noise and outliers
      • Weakness
        • Biased towards globular clusters
        • Hierarchical analogue of K-means
5. Hierarchical Clustering: Problems and Limitations

  • Once a decision is made to combine two clusters, it cannot be undone
  • No objective function is directly minimized
  • Different schemes have problems with one or more of the following
    • sensitivity to noise and outliers
    • difficulty handling different sized clusters and convex shapes
    • breaking large clusters
6. MST: Divisive Hierarchical Clustering
  • Build MST (Minimum Spanning Tree)
    • start with a tree that consists of any point
    • in successive steps, look for the closest pair of points (p,q) such that one point (p) is in the current tree but the other (q) is not

Leave a Reply