Project Goals
3 min- Load Fashion-MNIST and map class indices to names.
- Build a CNN with conv + pool + dropout.
- Evaluate with a confusion matrix; find the hard pairs.
- Compare CNN vs dense net on a harder problem.
Warm-Up · The Ten Classes
5 minfrom tensorflow.keras.datasets import fashion_mnist (Xtr, ytr), (Xte, yte) = fashion_mnist.load_data() CLASSES = ["T-shirt", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"] print(Xtr.shape, "labels 0-9 →", CLASSES)
Fashion-MNIST looks like digits (28×28 gray) but is much harder — "Shirt" vs "T-shirt" vs "Pullover" vs "Coat" are visually close. Expect ~90% (vs 98% on digits). This gap teaches that accuracy is relative to problem difficulty.
Plan · CNN for Clothes
14 minPreprocess (add a channel dim for Conv2D)
Xtr = (Xtr.astype("float32") / 255.0)[..., None] # (60000, 28, 28, 1) Xte = (Xte.astype("float32") / 255.0)[..., None]
A solid CNN
from tensorflow import keras from tensorflow.keras import layers model = keras.Sequential([ layers.Input(shape=(28, 28, 1)), layers.Conv2D(32, 3, activation="relu", padding="same"), layers.MaxPooling2D(), layers.Conv2D(64, 3, activation="relu", padding="same"), layers.MaxPooling2D(), layers.Flatten(), layers.Dense(128, activation="relu"), layers.Dropout(0.4), layers.Dense(10, activation="softmax"), ]) model.compile("adam", "sparse_categorical_crossentropy", metrics=["accuracy"])
What the conv layers learn
Conv2D(32, 3) 32 filters, each 3×3, slide across the image
early layers learn edges; later layers learn textures/parts
MaxPooling2D shrink the map, keep the strongest signals (translation tolerance)Build · fashion.py
12 min# fashion.py — CNN clothing classifier import numpy as np, matplotlib.pyplot as plt from tensorflow import keras from tensorflow.keras import layers from tensorflow.keras.datasets import fashion_mnist from tensorflow.keras.callbacks import EarlyStopping from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay CLASSES = ["T-shirt", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"] (Xtr, ytr), (Xte, yte) = fashion_mnist.load_data() Xtr = (Xtr.astype("float32") / 255.0)[..., None] Xte = (Xte.astype("float32") / 255.0)[..., None] model = keras.Sequential([ layers.Input(shape=(28, 28, 1)), layers.Conv2D(32, 3, activation="relu", padding="same"), layers.MaxPooling2D(), layers.Conv2D(64, 3, activation="relu", padding="same"), layers.MaxPooling2D(), layers.Flatten(), layers.Dense(128, activation="relu"), layers.Dropout(0.4), layers.Dense(10, activation="softmax"), ]) model.compile("adam", "sparse_categorical_crossentropy", metrics=["accuracy"]) es = EarlyStopping(monitor="val_loss", patience=3, restore_best_weights=True) model.fit(Xtr, ytr, validation_split=0.1, epochs=20, batch_size=128, callbacks=[es], verbose=2) acc = model.evaluate(Xte, yte, verbose=0)[1] print(f"\ntest accuracy: {acc:.2%}") preds = model.predict(Xte).argmax(axis=1) print(classification_report(yte, preds, target_names=CLASSES)) ConfusionMatrixDisplay(confusion_matrix(yte, preds), display_labels=CLASSES).plot(xticks_rotation=45) plt.tight_layout(); plt.savefig("fashion_cm.png", dpi=130)
Sample output
test accuracy: 91.3%
precision recall f1
T-shirt 0.84 0.86 0.85
...
Shirt 0.74 0.66 0.70 ← hardest class
Trouser 0.99 0.98 0.99 ← easiestRead the diff
The per-class report is gold: "Trouser" is nearly perfect (distinctive shape), "Shirt" is the worst (it overlaps with T-shirt, Pullover, and Coat). The confusion matrix shows exactly which items bleed into each other. This is far more useful than the single 91% — it tells you where the model struggles.
Extensions
13 minTrain the dense net from Lesson 27 on Fashion-MNIST. How much does the CNN beat it by?
Plot the 32 first-layer Conv2D filters as small images. Can you see edge/texture detectors?
From the confusion matrix, list the top 3 off-diagonal pairs. Display real misclassified examples for each. Are they genuinely ambiguous?
Stretch · Augment to 93%+
8 minAdd Keras data augmentation (random flips, small rotations/zooms) and a third conv block. Train longer with early stopping. Aim for 93%+. Report what helped most.
Show the augmentation layer
augment = keras.Sequential([ layers.RandomFlip("horizontal"), layers.RandomRotation(0.05), layers.RandomZoom(0.1), ]) # put 'augment' as the first layer after Input
Recap
3 minFashion-MNIST is the same format as digits but harder, so accuracy lands ~90% — accuracy is always relative to difficulty. A CNN (conv + pool) beats a dense net on images by respecting 2-D structure. The per-class report + confusion matrix reveal which classes overlap. Next: skip training entirely and use pre-trained models.
Homework
4 minTrain the fashion CNN, report test accuracy + per-class report + confusion matrix. Identify the hardest class and show 4 of its mistakes. One paragraph on why those classes are hard for both AI and humans.
The deliverable is fashion.py + figures. "Shirt" is reliably the hardest — it overlaps T-shirt/Pullover/Coat, which even people confuse from a 28×28 thumbnail.