K-Means

Posted Feb 9, 2026 Updated May 4, 2026

2 min read

K-Means

🌳 Hierarchical Clustering & 📦 K-Means

Unsupervised Learning – Clustering Algorithms (Lecture Notes)

🌳 Hierarchical Clustering

🔁 Agglomerative Clustering (Bottom-Up)

Level 0: Start with singleton clusters
(each data point is its own cluster)
For level ℓ = 1 to L:
- Set a distance threshold $\tau_\ell$ (monotonically increasing)
- Merge clusters whose distance is within $\tau_\ell$

➡️ This process builds a hierarchical tree (dendrogram)

📏 Distance Between Clusters

At level ≥ 1, we need distance between clusters, not points.

Common Linkage Methods

Let clusters be $C_p, C_q$

1️⃣ Single-Link (Closest Pair)

$d_{\text{single}}(C_p, C_q) = \min_{x \in C_p, y \in C_q} \|x - y\|$

Can cause chaining problem
Distant points may be grouped together indirectly

2️⃣ Complete-Link (Farthest Pair)

$d_{\text{complete}}(C_p, C_q) = \max_{x \in C_p, y \in C_q} \|x - y\|$

Produces compact clusters
Sensitive to outliers

3️⃣ Average-Link

$d_{\text{avg}}(C_p, C_q) = \frac{1}{|C_p||C_q|} \sum_{x \in C_p}\sum_{y \in C_q} \|x-y\|$

4️⃣ Centroid Distance

$d(C_p, C_q) = \|\mu_p - \mu_q\|$

where $\mu_p = \frac{1}{|C_p|}\sum_{x \in C_p} x$

⚠️ Important Note

Different linkage choices → different clustering behavior
Clustering results are not unique.

📦 K-Means Clustering

🔄 Overview

Iterative
Partitional clustering algorithm
Requires number of clusters K

🎯 Objective Function

Minimize within-cluster sum of squares:

\[J = \sum_{i=1}^{N}\sum_{k=1}^{K} r_{ik} \|x_i - \mu_k\|^2\]

Where:

$x_i \in \mathbb{R}^d$ : data point
$\mu_k \in \mathbb{R}^d$ : centroid of cluster k
$r_{ik} \in {0,1}$ : $r_{ik} = \begin{cases} 1 & \text{if } x_i \text{ assigned to cluster } k \\ 0 & \text{otherwise} \end{cases}$

⚙️ K-Means Algorithm

Step 1️⃣ Initialization

Pick K random points as initial centroids: $\mu_1, \mu_2, \dots, \mu_K$

Step 2️⃣ Assignment Step (a)

For each data point: $r_{ik} = \begin{cases} 1 & k = \arg\min_j \|x_i - \mu_j\|^2 \\ 0 & \text{otherwise} \end{cases}$

Step 3️⃣ Update Step (b)

Update centroids: $\mu_k = \frac{\sum_{i=1}^{N} r_{ik} x_i} {\sum_{i=1}^{N} r_{ik}}$

Step 4️⃣ Repeat

Repeat (a) and (b) until assignments do not change.

✅ Does K-Means Always Terminate?

✔️ Yes

Reasons:

Assignment step does not increase $J$
Update step minimizes $J$ w.r.t. $\mu_k$
Finite number of possible assignments

➡️ Converges to a local minimum

❌ Does It Always Converge to Same Result?

❌ No

Depends on random initialization
Different runs → different local optima

➡️ Use multiple restarts or K-means++

⏱️ Time Complexity

Let:

N = number of points
K = number of clusters
T = iterations

Assignment Step:

$O(KN)$

Update Step:

$O(N)$

Total:

$O(TKN)$

Worst case is NP-hard, but in practice T is small.

❓ How to Choose K

Problem

Objective $J$ always decreases as K increases.

🦴 Elbow Method

Plot $J$ vs K
Choose K where decrease slows (elbow point)

⚠️ Sometimes elbow is unclear.

🧠 Summary

Hierarchical clustering builds a dendrogram
K-means optimizes a clear objective
Both depend on distance definitions
No single “correct” clustering

✨ Clustering is exploratory, not definitive

Machince Learning, Machince Learning - Foundation

This post is licensed under CC BY 4.0 by the author.