K-Means
🌳 Hierarchical Clustering & 📦 K-Means
Unsupervised Learning – Clustering Algorithms (Lecture Notes)
🌳 Hierarchical Clustering
🔁 Agglomerative Clustering (Bottom-Up)
Level 0: Start with singleton clusters
(each data point is its own cluster)For level ℓ = 1 to L:
- Set a distance threshold \(\tau_\ell\) (monotonically increasing)
- Merge clusters whose distance is within \(\tau_\ell\)
➡️ This process builds a hierarchical tree (dendrogram)
📏 Distance Between Clusters
At level ≥ 1, we need distance between clusters, not points.
Common Linkage Methods
Let clusters be \(C_p, C_q\)
1️⃣ Single-Link (Closest Pair)
\(d_{\text{single}}(C_p, C_q) = \min_{x \in C_p, y \in C_q} \|x - y\|\)
- Can cause chaining problem
- Distant points may be grouped together indirectly
2️⃣ Complete-Link (Farthest Pair)
\(d_{\text{complete}}(C_p, C_q) = \max_{x \in C_p, y \in C_q} \|x - y\|\)
- Produces compact clusters
- Sensitive to outliers
3️⃣ Average-Link
\(d_{\text{avg}}(C_p, C_q) = \frac{1}{|C_p||C_q|} \sum_{x \in C_p}\sum_{y \in C_q} \|x-y\|\)
4️⃣ Centroid Distance
\(d(C_p, C_q) = \|\mu_p - \mu_q\|\)
where \(\mu_p = \frac{1}{|C_p|}\sum_{x \in C_p} x\)
⚠️ Important Note
Different linkage choices → different clustering behavior
Clustering results are not unique.
📦 K-Means Clustering
🔄 Overview
- Iterative
- Partitional clustering algorithm
- Requires number of clusters K
🎯 Objective Function
Minimize within-cluster sum of squares:
\[J = \sum_{i=1}^{N}\sum_{k=1}^{K} r_{ik} \|x_i - \mu_k\|^2\]Where:
- $x_i \in \mathbb{R}^d$ : data point
- $\mu_k \in \mathbb{R}^d$ : centroid of cluster k
- $r_{ik} \in {0,1}$ : \(r_{ik} = \begin{cases} 1 & \text{if } x_i \text{ assigned to cluster } k \\ 0 & \text{otherwise} \end{cases}\)
⚙️ K-Means Algorithm
Step 1️⃣ Initialization
Pick K random points as initial centroids: \(\mu_1, \mu_2, \dots, \mu_K\)
Step 2️⃣ Assignment Step (a)
For each data point: \(r_{ik} = \begin{cases} 1 & k = \arg\min_j \|x_i - \mu_j\|^2 \\ 0 & \text{otherwise} \end{cases}\)
Step 3️⃣ Update Step (b)
Update centroids: \(\mu_k = \frac{\sum_{i=1}^{N} r_{ik} x_i} {\sum_{i=1}^{N} r_{ik}}\)
Step 4️⃣ Repeat
Repeat (a) and (b) until assignments do not change.
✅ Does K-Means Always Terminate?
✔️ Yes
Reasons:
- Assignment step does not increase \(J\)
- Update step minimizes \(J\) w.r.t. \(\mu_k\)
- Finite number of possible assignments
➡️ Converges to a local minimum
❌ Does It Always Converge to Same Result?
❌ No
- Depends on random initialization
- Different runs → different local optima
➡️ Use multiple restarts or K-means++
⏱️ Time Complexity
Let:
- N = number of points
- K = number of clusters
- T = iterations
Assignment Step:
\(O(KN)\)
Update Step:
\(O(N)\)
Total:
\(O(TKN)\)
Worst case is NP-hard, but in practice T is small.
❓ How to Choose K
Problem
Objective \(J\) always decreases as K increases.
🦴 Elbow Method
- Plot \(J\) vs K
- Choose K where decrease slows (elbow point)
⚠️ Sometimes elbow is unclear.
🧠 Summary
- Hierarchical clustering builds a dendrogram
- K-means optimizes a clear objective
- Both depend on distance definitions
- No single “correct” clustering
✨ Clustering is exploratory, not definitive