Learning Goals
3 min- Define overfitting vs underfitting via the bias-variance idea.
- Spot overfitting from the train-vs-validation gap.
- Name the cures: more data, simpler model, regularisation, early stopping.
- Demonstrate overfitting and fix it.
Warm-Up · The Student Who Memorised
5 minOne student memorises every past exam answer word-for-word — perfect on practice, lost on a new question. Another learns the underlying concepts — slightly worse on practice, great on anything new. The second "generalises". ML models can do either.
underfit too simple — bad on train AND test good fit learns the real pattern — good on both overfit memorised noise — great on train, bad on test
The gap between training score and validation/test score is the overfitting signal. Train high + val low = overfit. Both low = underfit. Close and decent = good. You ALREADY have the diagnostic — the train-vs-val curve.
New Concept · Spot It, Fix It
14 minThe tell-tale curve
accuracy │ train ──────────────── ← keeps rising toward 100% │ val ──────╮ │ ╰────╮ ← peaks, then FALLS ← overfitting starts here └───────────────────── epochs
When validation accuracy peaks and then declines while training keeps climbing, the model has started memorising. The peak is where you'd stop.
The cures (in order of preference)
1. MORE DATA the best fix — more examples, harder to memorise 2. SIMPLER MODEL fewer layers/neurons, smaller max_depth 3. REGULARISATION penalise complexity (L2, dropout — next lesson) 4. EARLY STOPPING stop at the val peak 5. DATA AUGMENTATION (images) flip/rotate to fake more data
Underfitting — the opposite
Both train and val are low and flat.
Cure: a MORE powerful model, more features, train longer,
higher learning rate / capacity.The sweet spot is a balance
Increase model capacity until validation stops improving; that's the edge of overfitting. Too little capacity = underfit; too much = overfit. The right amount depends on how much data you have.
Worked Example · Make a Net Overfit, Then Fix It
12 min# overfit_demo.py — too-big net on too-little data import numpy as np from sklearn.datasets import load_breast_cancer from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from tensorflow import keras from tensorflow.keras import layers import matplotlib.pyplot as plt X, y = load_breast_cancer(return_X_y=True) X = StandardScaler().fit_transform(X) # use only 60 samples to make overfitting easy to trigger Xtr, Xte, ytr, yte = train_test_split(X, y, train_size=60, random_state=0) def big_net(): return keras.Sequential([ layers.Input(shape=(30,)), layers.Dense(256, activation="relu"), layers.Dense(256, activation="relu"), layers.Dense(1, activation="sigmoid"), ]) m = big_net() m.compile("adam", "binary_crossentropy", metrics=["accuracy"]) h = m.fit(Xtr, ytr, validation_data=(Xte, yte), epochs=200, batch_size=8, verbose=0) plt.plot(h.history["accuracy"], label="train") plt.plot(h.history["val_accuracy"], label="val") plt.xlabel("epoch"); plt.ylabel("accuracy"); plt.legend() plt.title("Overfitting: train ↑, val ↓") plt.savefig("overfit.png", dpi=150) print("final train acc:", round(h.history["accuracy"][-1], 3)) print("final val acc:", round(h.history["val_accuracy"][-1], 3))
Sample output
final train acc: 1.000 final val acc: 0.912
Train hits 100% (memorised the 60 samples); val lags and wobbles. Now the fix — smaller net + early stopping:
from tensorflow.keras.callbacks import EarlyStopping small = keras.Sequential([ layers.Input(shape=(30,)), layers.Dense(16, activation="relu"), layers.Dense(1, activation="sigmoid"), ]) small.compile("adam", "binary_crossentropy", metrics=["accuracy"]) es = EarlyStopping(monitor="val_loss", patience=15, restore_best_weights=True) small.fit(Xtr, ytr, validation_data=(Xte, yte), epochs=200, batch_size=8, callbacks=[es], verbose=0) print("fixed val acc:", round(small.evaluate(Xte, yte, verbose=0)[1], 3))
Read the diff
The smaller net can't memorise as easily, and early stopping halts at the validation peak. The train-val gap shrinks and the test score is more honest. Two of the cheapest cures — less capacity + early stopping — handle most overfitting.
Try It Yourself
13 minGiven three train/val curve descriptions, label each underfit / good / overfit.
Re-run the overfit demo with train_size=400 instead of 60. Does the train-val gap shrink?
Recreate the overfitting curve for a decision tree by plotting train vs CV accuracy as max_depth grows. Find the depth where they diverge.
Mini-Challenge · Capacity Sweep
8 minTrain nets with hidden sizes [4, 16, 64, 256] on a small dataset. Plot final train accuracy AND val accuracy vs hidden size. Identify where val stops improving (the sweet spot) and where overfitting begins.
Show the structure
sizes = [4, 16, 64, 256] train_acc, val_acc = [], [] for s in sizes: m = keras.Sequential([layers.Input(shape=(30,)), layers.Dense(s, activation="relu"), layers.Dense(1, activation="sigmoid")]) m.compile("adam", "binary_crossentropy", metrics=["accuracy"]) h = m.fit(Xtr, ytr, validation_data=(Xte, yte), epochs=100, batch_size=8, verbose=0) train_acc.append(h.history["accuracy"][-1]) val_acc.append(h.history["val_accuracy"][-1]) plt.plot(sizes, train_acc, "o-", label="train") plt.plot(sizes, val_acc, "o-", label="val") plt.xscale("log"); plt.legend(); plt.xlabel("hidden size")
Non-negotiables: the train line keeps rising; the val line peaks then plateaus/falls — the gap IS overfitting.
Recap
3 minOverfitting = memorising training data; the tell is a train-vs-val gap. Underfitting = too simple, both scores low. Cures in order: more data, simpler model, regularisation, early stopping, augmentation. Always watch the validation curve. Next: the two regularisation tricks — dropout and weight penalties.
Vocabulary Card
- overfitting
- High train score, low test score — memorised noise, didn't generalise.
- underfitting
- Low scores everywhere — model too simple to capture the pattern.
- generalisation
- Performing well on data the model never saw — the actual goal.
- early stopping
- Halting training at the validation peak to avoid memorising.
Homework
4 minDeliberately overfit a model on a small dataset, capture the curve, then apply two cures (e.g., less capacity + early stopping, or more data). Show before/after curves and the improved test score. One paragraph: which cure helped most and why.
Reuse overfit_demo.py. Usually "more data" helps most when available; otherwise "simpler model + early stopping" wins.