Post

K-Means

K-Means

🌳 Hierarchical Clustering & 📦 K-Means

Unsupervised Learning – Clustering Algorithms (Lecture Notes)


🌳 Hierarchical Clustering

🔁 Agglomerative Clustering (Bottom-Up)

  • Level 0: Start with singleton clusters
    (each data point is its own cluster)

  • For level ℓ = 1 to L:

    • Set a distance threshold \(\tau_\ell\) (monotonically increasing)
    • Merge clusters whose distance is within \(\tau_\ell\)

➡️ This process builds a hierarchical tree (dendrogram)


📏 Distance Between Clusters

At level ≥ 1, we need distance between clusters, not points.

Common Linkage Methods

Let clusters be \(C_p, C_q\)

\(d_{\text{single}}(C_p, C_q) = \min_{x \in C_p, y \in C_q} \|x - y\|\)

  • Can cause chaining problem
  • Distant points may be grouped together indirectly

\(d_{\text{complete}}(C_p, C_q) = \max_{x \in C_p, y \in C_q} \|x - y\|\)

  • Produces compact clusters
  • Sensitive to outliers

\(d_{\text{avg}}(C_p, C_q) = \frac{1}{|C_p||C_q|} \sum_{x \in C_p}\sum_{y \in C_q} \|x-y\|\)


4️⃣ Centroid Distance

\(d(C_p, C_q) = \|\mu_p - \mu_q\|\)

where \(\mu_p = \frac{1}{|C_p|}\sum_{x \in C_p} x\)


⚠️ Important Note

Different linkage choices → different clustering behavior
Clustering results are not unique.


📦 K-Means Clustering

🔄 Overview

  • Iterative
  • Partitional clustering algorithm
  • Requires number of clusters K

🎯 Objective Function

Minimize within-cluster sum of squares:

\[J = \sum_{i=1}^{N}\sum_{k=1}^{K} r_{ik} \|x_i - \mu_k\|^2\]

Where:

  • $x_i \in \mathbb{R}^d$ : data point
  • $\mu_k \in \mathbb{R}^d$ : centroid of cluster k
  • $r_{ik} \in {0,1}$ : \(r_{ik} = \begin{cases} 1 & \text{if } x_i \text{ assigned to cluster } k \\ 0 & \text{otherwise} \end{cases}\)

⚙️ K-Means Algorithm

Step 1️⃣ Initialization

Pick K random points as initial centroids: \(\mu_1, \mu_2, \dots, \mu_K\)


Step 2️⃣ Assignment Step (a)

For each data point: \(r_{ik} = \begin{cases} 1 & k = \arg\min_j \|x_i - \mu_j\|^2 \\ 0 & \text{otherwise} \end{cases}\)


Step 3️⃣ Update Step (b)

Update centroids: \(\mu_k = \frac{\sum_{i=1}^{N} r_{ik} x_i} {\sum_{i=1}^{N} r_{ik}}\)


Step 4️⃣ Repeat

Repeat (a) and (b) until assignments do not change.


✅ Does K-Means Always Terminate?

✔️ Yes

Reasons:

  • Assignment step does not increase \(J\)
  • Update step minimizes \(J\) w.r.t. \(\mu_k\)
  • Finite number of possible assignments

➡️ Converges to a local minimum


❌ Does It Always Converge to Same Result?

No

  • Depends on random initialization
  • Different runs → different local optima

➡️ Use multiple restarts or K-means++


⏱️ Time Complexity

Let:

  • N = number of points
  • K = number of clusters
  • T = iterations

Assignment Step:

\(O(KN)\)

Update Step:

\(O(N)\)

Total:

\(O(TKN)\)

Worst case is NP-hard, but in practice T is small.


❓ How to Choose K

Problem

Objective \(J\) always decreases as K increases.


🦴 Elbow Method

  • Plot \(J\) vs K
  • Choose K where decrease slows (elbow point)

⚠️ Sometimes elbow is unclear.


🧠 Summary

  • Hierarchical clustering builds a dendrogram
  • K-means optimizes a clear objective
  • Both depend on distance definitions
  • No single “correct” clustering

✨ Clustering is exploratory, not definitive

This post is licensed under CC BY 4.0 by the author.