PY-L5-28 · Project — Fashion Item Classifier

Project Goals

3 min

Load Fashion-MNIST and map class indices to names.
Build a CNN with conv + pool + dropout.
Evaluate with a confusion matrix; find the hard pairs.
Compare CNN vs dense net on a harder problem.

Warm-Up · The Ten Classes

5 min

from tensorflow.keras.datasets import fashion_mnist
(Xtr, ytr), (Xte, yte) = fashion_mnist.load_data()

CLASSES = ["T-shirt", "Trouser", "Pullover", "Dress", "Coat",
           "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]
print(Xtr.shape, "labels 0-9 →", CLASSES)

Today's big idea

Fashion-MNIST looks like digits (28×28 gray) but is much harder — "Shirt" vs "T-shirt" vs "Pullover" vs "Coat" are visually close. Expect ~90% (vs 98% on digits). This gap teaches that accuracy is relative to problem difficulty.

Plan · CNN for Clothes

14 min

Preprocess (add a channel dim for Conv2D)

Xtr = (Xtr.astype("float32") / 255.0)[..., None]   # (60000, 28, 28, 1)
Xte = (Xte.astype("float32") / 255.0)[..., None]

A solid CNN

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Input(shape=(28, 28, 1)),
    layers.Conv2D(32, 3, activation="relu", padding="same"),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, activation="relu", padding="same"),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(128, activation="relu"),
    layers.Dropout(0.4),
    layers.Dense(10, activation="softmax"),
])
model.compile("adam", "sparse_categorical_crossentropy", metrics=["accuracy"])

What the conv layers learn

Conv2D(32, 3)   32 filters, each 3×3, slide across the image
                early layers learn edges; later layers learn textures/parts
MaxPooling2D    shrink the map, keep the strongest signals (translation tolerance)

Build · fashion.py

12 min

# fashion.py — CNN clothing classifier
import numpy as np, matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay

CLASSES = ["T-shirt", "Trouser", "Pullover", "Dress", "Coat",
           "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

(Xtr, ytr), (Xte, yte) = fashion_mnist.load_data()
Xtr = (Xtr.astype("float32") / 255.0)[..., None]
Xte = (Xte.astype("float32") / 255.0)[..., None]

model = keras.Sequential([
    layers.Input(shape=(28, 28, 1)),
    layers.Conv2D(32, 3, activation="relu", padding="same"),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, activation="relu", padding="same"),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(128, activation="relu"),
    layers.Dropout(0.4),
    layers.Dense(10, activation="softmax"),
])
model.compile("adam", "sparse_categorical_crossentropy", metrics=["accuracy"])

es = EarlyStopping(monitor="val_loss", patience=3, restore_best_weights=True)
model.fit(Xtr, ytr, validation_split=0.1, epochs=20,
          batch_size=128, callbacks=[es], verbose=2)

acc = model.evaluate(Xte, yte, verbose=0)[1]
print(f"\ntest accuracy: {acc:.2%}")

preds = model.predict(Xte).argmax(axis=1)
print(classification_report(yte, preds, target_names=CLASSES))

ConfusionMatrixDisplay(confusion_matrix(yte, preds),
                       display_labels=CLASSES).plot(xticks_rotation=45)
plt.tight_layout(); plt.savefig("fashion_cm.png", dpi=130)

Sample output

test accuracy: 91.3%

              precision  recall  f1
T-shirt          0.84    0.86  0.85
...
Shirt            0.74    0.66  0.70   ← hardest class
Trouser          0.99    0.98  0.99   ← easiest

Read the diff

The per-class report is gold: "Trouser" is nearly perfect (distinctive shape), "Shirt" is the worst (it overlaps with T-shirt, Pullover, and Coat). The confusion matrix shows exactly which items bleed into each other. This is far more useful than the single 91% — it tells you where the model struggles.

Extensions

13 min

01 🟢 Dense baseline

Train the dense net from Lesson 27 on Fashion-MNIST. How much does the CNN beat it by?

02 🟡 Visualise filters

Plot the 32 first-layer Conv2D filters as small images. Can you see edge/texture detectors?

03 🔴 Hardest confusions

From the confusion matrix, list the top 3 off-diagonal pairs. Display real misclassified examples for each. Are they genuinely ambiguous?

Stretch · Augment to 93%+

8 min

Add Keras data augmentation (random flips, small rotations/zooms) and a third conv block. Train longer with early stopping. Aim for 93%+. Report what helped most.

Show the augmentation layer

augment = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.05),
    layers.RandomZoom(0.1),
])
# put 'augment' as the first layer after Input

Recap

3 min

Fashion-MNIST is the same format as digits but harder, so accuracy lands ~90% — accuracy is always relative to difficulty. A CNN (conv + pool) beats a dense net on images by respecting 2-D structure. The per-class report + confusion matrix reveal which classes overlap. Next: skip training entirely and use pre-trained models.

Homework

4 min

Train the fashion CNN, report test accuracy + per-class report + confusion matrix. Identify the hardest class and show 4 of its mistakes. One paragraph on why those classes are hard for both AI and humans.

from tensorflow.keras.datasets import fashion_mnist (Xtr, ytr), (Xte, yte) = fashion_mnist.load_data() CLASSES = ["T-shirt", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"] print(Xtr.shape, "labels 0-9 →", CLASSES)

from tensorflow import keras from tensorflow.keras import layers model = keras.Sequential([ layers.Input(shape=(28, 28, 1)), layers.Conv2D(32, 3, activation="relu", padding="same"), layers.MaxPooling2D(), layers.Conv2D(64, 3, activation="relu", padding="same"), layers.MaxPooling2D(), layers.Flatten(), layers.Dense(128, activation="relu"), layers.Dropout(0.4), layers.Dense(10, activation="softmax"), ]) model.compile("adam", "sparse_categorical_crossentropy", metrics=["accuracy"])

Conv2D(32, 3) 32 filters, each 3×3, slide across the image early layers learn edges; later layers learn textures/parts MaxPooling2D shrink the map, keep the strongest signals (translation tolerance)

# fashion.py — CNN clothing classifier import numpy as np, matplotlib.pyplot as plt from tensorflow import keras from tensorflow.keras import layers from tensorflow.keras.datasets import fashion_mnist from tensorflow.keras.callbacks import EarlyStopping from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay CLASSES = ["T-shirt", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"] (Xtr, ytr), (Xte, yte) = fashion_mnist.load_data() Xtr = (Xtr.astype("float32") / 255.0)[..., None] Xte = (Xte.astype("float32") / 255.0)[..., None] model = keras.Sequential([ layers.Input(shape=(28, 28, 1)), layers.Conv2D(32, 3, activation="relu", padding="same"), layers.MaxPooling2D(), layers.Conv2D(64, 3, activation="relu", padding="same"), layers.MaxPooling2D(), layers.Flatten(), layers.Dense(128, activation="relu"), layers.Dropout(0.4), layers.Dense(10, activation="softmax"), ]) model.compile("adam", "sparse_categorical_crossentropy", metrics=["accuracy"]) es = EarlyStopping(monitor="val_loss", patience=3, restore_best_weights=True) model.fit(Xtr, ytr, validation_split=0.1, epochs=20, batch_size=128, callbacks=[es], verbose=2) acc = model.evaluate(Xte, yte, verbose=0)[1] print(f"\ntest accuracy: {acc:.2%}") preds = model.predict(Xte).argmax(axis=1) print(classification_report(yte, preds, target_names=CLASSES)) ConfusionMatrixDisplay(confusion_matrix(yte, preds), display_labels=CLASSES).plot(xticks_rotation=45) plt.tight_layout(); plt.savefig("fashion_cm.png", dpi=130)

test accuracy: 91.3% precision recall f1 T-shirt 0.84 0.86 0.85 ... Shirt 0.74 0.66 0.70 ← hardest class Trouser 0.99 0.98 0.99 ← easiest