UC Berkeley · Data 100 SP26 · Final-prep companion
Step-through demos for the clustering material in Data 100 (Course Note 25). Each section is a small interactive canvas — click the buttons to see the algorithms run.
01Agglomerative Clustering (bottom-up)
Every point starts as its own cluster. Each step merges the two closest clusters (centroid linkage).
Lines show the merge history — read like a compressed dendrogram in space.
Key idea: Inertia — the within-cluster sum of squares — strictly decreases with every merge in K-Means, but for agglomerative clustering the cluster count $k$ goes down by one each step, from $n$ all the way to $1$.
02K-Means — assign & update, one step at a time
Each iteration alternates: assign every point to its nearest centroid, then
update each centroid to the mean of its assigned points. Repeat until centroids stop moving.
k = 3phase: inititer: 0inertia: —
Watch for: centroids (★) move toward the centers of their colored clusters each "update" step, and points may switch colors on the next "assign" step. Convergence happens when no point changes color.
03Initialization matters — random vs K-Means++
K-Means is sensitive to where the centroids start. K-Means++ spreads initial centroids out using a weighted probability — usually finding a better local optimum than uniform random init.
Click "Run new trial" a few times: random init occasionally lands in a noticeably worse solution.
k = 4
Random initialization
inertia = — · iters = —
K-Means++ initialization
inertia = — · iters = —
Standard practice: run K-Means many times with different random inits and pick the run with lowest inertia (sklearn's n_init argument), or just use K-Means++ which usually needs fewer restarts.
04Agglomerative — single, complete, average, centroid linkage
The choice of linkage criterion changes which clusters get merged first. Same data, four different linkages, all stopped at 3 clusters.
data shape:
Single linkage (min)
Complete linkage (max)
Average linkage
Centroid linkage
Single linkage can chain elongated clusters (good for moons/rings, bad for noisy blobs).
Complete linkage tends to produce compact, spherical clusters.
Average and centroid sit between the two extremes.
05Choosing $k$ — the elbow method
Run K-Means for $k = 1, 2, \dots, 10$ and plot the inertia of each. Inertia always decreases as $k$ grows
(at $k = n$ it hits $0$), so we look for the "elbow" — the largest drop, where adding more clusters stops paying off.
Data & clustering at chosen $k$
currently showing $k$ = —
Inertia vs $k$ (click a point)
elbow heuristic suggests $k^* = $ —
Caveat: the elbow is heuristic — different datasets have clearer or fuzzier elbows. Methods like the silhouette score or gap statistic are more principled (and beyond Data 100's scope).
06Where K-Means breaks down
K-Means assumes clusters are roughly spherical and equally sized in feature space.
It does poorly on non-convex shapes, very different cluster densities, or when k is wrong.
Switch the data shape to see when it succeeds and when it fails.
data shape: k = 2
Takeaway: K-Means optimizes mean squared distance to centroid. Any structure that doesn't fit "compact ball around a mean" — moons, rings, long stripes — gets sliced strangely. For those, prefer DBSCAN, spectral clustering, or single-linkage agglomerative.