PY-L5-23 · Build a Tiny Neural Net with Keras

Learning Goals

3 min

Install TensorFlow/Keras; build a Sequential model.
Add Dense layers with the right activations.
compile with a loss + optimiser, then fit.
Evaluate and read the training history.

Warm-Up · Install & Imports

5 min

pip install tensorflow

# (If install is heavy on your machine, run this lesson in
#  Google Colab — TensorFlow is pre-installed there, free GPU too.)

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
print(tf.__version__)

Today's big idea

Keras turns the architecture you drew in Lesson 21 into code: each Dense(n, activation) is a layer of n neurons. compile picks how to learn; fit runs the training loop. You stopped doing the matrix maths by hand the moment you imported Keras.

New Concept · Sequential, Dense, compile, fit

14 min

Build the architecture

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Input(shape=(4,)),         # 4 input features (iris)
    layers.Dense(16, activation="relu"),
    layers.Dense(8,  activation="relu"),
    layers.Dense(3,  activation="softmax"),  # 3 classes
])
model.summary()

Sequential = a straight stack of layers. Dense(n) = a fully-connected layer of n neurons. The last layer's activation + size matches the task (Lesson 22).

Compile — how to learn

model.compile(
    optimizer="adam",                          # how to update weights
    loss="sparse_categorical_crossentropy",    # what to minimise (integer labels)
    metrics=["accuracy"],
)

Loss cheat sheet:
  binary:           binary_crossentropy        (sigmoid output)
  multiclass (ints):sparse_categorical_crossentropy (softmax)
  multiclass (1-hot):categorical_crossentropy
  regression:       mse / mae

Fit — run the training loop

history = model.fit(
    X_train, y_train,
    validation_split=0.2,   # watch generalisation as it trains
    epochs=50,              # passes over the data
    batch_size=16,
    verbose=0,
)

Evaluate & predict

loss, acc = model.evaluate(X_test, y_test, verbose=0)
print(f"test accuracy: {acc:.1%}")

probs = model.predict(X_test[:3])     # softmax probabilities
preds = probs.argmax(axis=1)          # pick the top class

Worked Example · Iris in Keras

12 min

# iris_keras.py — a 3-layer net on iris
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow import keras
from tensorflow.keras import layers

X, y = load_iris(return_X_y=True)
X = StandardScaler().fit_transform(X)        # nets train better on scaled data
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2,
                                      stratify=y, random_state=0)

model = keras.Sequential([
    layers.Input(shape=(4,)),
    layers.Dense(16, activation="relu"),
    layers.Dense(8,  activation="relu"),
    layers.Dense(3,  activation="softmax"),
])
model.compile(optimizer="adam",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])

history = model.fit(Xtr, ytr, validation_split=0.2,
                    epochs=80, batch_size=8, verbose=0)

loss, acc = model.evaluate(Xte, yte, verbose=0)
print(f"test accuracy: {acc:.1%}")

# plot learning curves
import matplotlib.pyplot as plt
plt.plot(history.history["accuracy"], label="train")
plt.plot(history.history["val_accuracy"], label="val")
plt.xlabel("epoch"); plt.ylabel("accuracy"); plt.legend()
plt.savefig("learning_curve.png", dpi=150)

Sample output

test accuracy: 96.7%

Read the diff

Ten lines built and trained a real neural net. Two habits to keep: scale the features (nets are sensitive to scale) and plot train vs val accuracy — if val stops rising while train keeps climbing, you're overfitting (Lesson 25). For iris, a forest does this just as well in less time — neural nets shine on images and text, which is where we're headed.

Try It Yourself

13 min

01 🟢 Build & summarise

Build a 2-hidden-layer net for a binary task (sigmoid output, binary_crossentropy). Print model.summary() and read the parameter count.

02 🟡 Architecture sweep

Try 1 hidden layer vs 3, and 8 neurons vs 64. Which generalises best on a held-out set?

03 🔴 Regression net

Build a net for the property data (Lesson 20): linear output (1 node), loss="mse". Compare its RMSE to the random forest.

Hint

model = keras.Sequential([
    layers.Input(shape=(n_features,)),
    layers.Dense(64, activation="relu"),
    layers.Dense(32, activation="relu"),
    layers.Dense(1),                      # no activation = regression
])
model.compile(optimizer="adam", loss="mse", metrics=["mae"])

Mini-Challenge · Early Stopping

8 min

Add an EarlyStopping callback that halts training when validation loss stops improving, restoring the best weights. This prevents wasting epochs and overfitting.

Show one possible solution

from tensorflow.keras.callbacks import EarlyStopping

es = EarlyStopping(monitor="val_loss", patience=10,
                   restore_best_weights=True)

history = model.fit(Xtr, ytr, validation_split=0.2,
                    epochs=500, batch_size=8,
                    callbacks=[es], verbose=0)
print("stopped at epoch", len(history.history["loss"]))

Non-negotiables: monitor val_loss, a patience window, restore_best_weights. Now you can set epochs high and let the callback decide when to stop.

Recap

3 min

Keras: Sequential stacks Dense layers; compile sets optimiser + loss + metrics; fit trains; evaluate scores. Scale your features, pick the loss by task, and always watch train-vs-val curves. EarlyStopping saves time and curbs overfitting. Next: what the optimiser and loss are actually doing.

Vocabulary Card

Sequential: A linear stack of layers — the simplest Keras model.
Dense: A fully-connected layer; every input connects to every neuron.
epoch / batch: One full pass over the data / a small chunk processed at once.
loss function: The number training tries to minimise; chosen by task type.

Homework

4 min

Build, compile, fit and evaluate a Keras net on any tabular dataset. Use EarlyStopping, plot the learning curves, and report test accuracy/RMSE. Compare it honestly to a scikit-learn model on the same data — which won, and was the net worth the extra complexity?

from tensorflow import keras from tensorflow.keras import layers model = keras.Sequential([ layers.Input(shape=(4,)), # 4 input features (iris) layers.Dense(16, activation="relu"), layers.Dense(8, activation="relu"), layers.Dense(3, activation="softmax"), # 3 classes ]) model.summary()

Loss cheat sheet: binary: binary_crossentropy (sigmoid output) multiclass (ints):sparse_categorical_crossentropy (softmax) multiclass (1-hot):categorical_crossentropy regression: mse / mae

loss, acc = model.evaluate(X_test, y_test, verbose=0) print(f"test accuracy: {acc:.1%}") probs = model.predict(X_test[:3]) # softmax probabilities preds = probs.argmax(axis=1) # pick the top class

# iris_keras.py — a 3-layer net on iris import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from tensorflow import keras from tensorflow.keras import layers X, y = load_iris(return_X_y=True) X = StandardScaler().fit_transform(X) # nets train better on scaled data Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2, stratify=y, random_state=0) model = keras.Sequential([ layers.Input(shape=(4,)), layers.Dense(16, activation="relu"), layers.Dense(8, activation="relu"), layers.Dense(3, activation="softmax"), ]) model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]) history = model.fit(Xtr, ytr, validation_split=0.2, epochs=80, batch_size=8, verbose=0) loss, acc = model.evaluate(Xte, yte, verbose=0) print(f"test accuracy: {acc:.1%}") # plot learning curves import matplotlib.pyplot as plt plt.plot(history.history["accuracy"], label="train") plt.plot(history.history["val_accuracy"], label="val") plt.xlabel("epoch"); plt.ylabel("accuracy"); plt.legend() plt.savefig("learning_curve.png", dpi=150)

model = keras.Sequential([ layers.Input(shape=(n_features,)), layers.Dense(64, activation="relu"), layers.Dense(32, activation="relu"), layers.Dense(1), # no activation = regression ]) model.compile(optimizer="adam", loss="mse", metrics=["mae"])

from tensorflow.keras.callbacks import EarlyStopping es = EarlyStopping(monitor="val_loss", patience=10, restore_best_weights=True) history = model.fit(Xtr, ytr, validation_split=0.2, epochs=500, batch_size=8, callbacks=[es], verbose=0) print("stopped at epoch", len(history.history["loss"]))