01Agglomerative Clustering (bottom-up)

Every point starts as its own cluster. Each step merges the two closest clusters (centroid linkage). Lines show the merge history — read like a compressed dendrogram in space.

Key idea: Inertia — the within-cluster sum of squares — strictly decreases with every merge in K-Means, but for agglomerative clustering the cluster count $k$ goes down by one each step, from $n$ all the way to $1$.

02K-Means — assign & update, one step at a time

Each iteration alternates: assign every point to its nearest centroid, then update each centroid to the mean of its assigned points. Repeat until centroids stop moving.

k = 3 phase: init iter: 0 inertia:
Watch for: centroids (★) move toward the centers of their colored clusters each "update" step, and points may switch colors on the next "assign" step. Convergence happens when no point changes color.

03Initialization matters — random vs K-Means++

K-Means is sensitive to where the centroids start. K-Means++ spreads initial centroids out using a weighted probability — usually finding a better local optimum than uniform random init. Click "Run new trial" a few times: random init occasionally lands in a noticeably worse solution.

k = 4

Random initialization

inertia = · iters =

K-Means++ initialization

inertia = · iters =
Standard practice: run K-Means many times with different random inits and pick the run with lowest inertia (sklearn's n_init argument), or just use K-Means++ which usually needs fewer restarts.

04Agglomerative — single, complete, average, centroid linkage

The choice of linkage criterion changes which clusters get merged first. Same data, four different linkages, all stopped at 3 clusters.

data shape:

Single linkage (min)

Complete linkage (max)

Average linkage

Centroid linkage

Single linkage can chain elongated clusters (good for moons/rings, bad for noisy blobs). Complete linkage tends to produce compact, spherical clusters. Average and centroid sit between the two extremes.

05Choosing $k$ — the elbow method

Run K-Means for $k = 1, 2, \dots, 10$ and plot the inertia of each. Inertia always decreases as $k$ grows (at $k = n$ it hits $0$), so we look for the "elbow" — the largest drop, where adding more clusters stops paying off.

Data & clustering at chosen $k$

currently showing $k$ =

Inertia vs $k$ (click a point)

elbow heuristic suggests $k^* = $
Caveat: the elbow is heuristic — different datasets have clearer or fuzzier elbows. Methods like the silhouette score or gap statistic are more principled (and beyond Data 100's scope).

06Where K-Means breaks down

K-Means assumes clusters are roughly spherical and equally sized in feature space. It does poorly on non-convex shapes, very different cluster densities, or when k is wrong. Switch the data shape to see when it succeeds and when it fails.

data shape: k = 2
Takeaway: K-Means optimizes mean squared distance to centroid. Any structure that doesn't fit "compact ball around a mean" — moons, rings, long stripes — gets sliced strangely. For those, prefer DBSCAN, spectral clustering, or single-linkage agglomerative.