Learning Goals
3 min- Describe a neuron: inputs × weights + bias → activation.
- Understand layers: input, hidden, output.
- See why a single neuron is just logistic regression — and why depth adds power.
- Code one neuron's forward pass in NumPy.
Warm-Up · What a Neuron Does
5 mininputs x1 x2 x3 weights w1 w2 w3 (learned) bias b (learned) z = x1·w1 + x2·w2 + x3·w3 + b ← weighted sum output = activation(z) ← squash (e.g. sigmoid, ReLU)
That's the whole neuron. The weights say "how much does each input matter"; the bias shifts the threshold; the activation adds non-linearity. Learning = adjusting the weights and bias to reduce error.
One neuron with a sigmoid IS logistic regression. The magic of deep learning is stacking: hidden layers let later neurons combine earlier features into more abstract ones — edges → shapes → objects.
New Concept · Neurons & Layers
14 minA network is layers of neurons
INPUT layer one node per feature (e.g. 4 for iris) HIDDEN layer neurons that combine inputs (you choose how many) OUTPUT layer one node per class (or 1 for regression / binary) [x1]\ [x2]──[h1]\ [x3]──[h2]──[out] [x4]──[h3]/
One neuron in NumPy
import numpy as np def sigmoid(z): return 1 / (1 + np.exp(-z)) x = np.array([0.5, 0.9, 0.1]) # inputs w = np.array([0.4, -0.2, 0.7]) # weights (would be learned) b = 0.1 # bias z = x @ w + b # weighted sum out = sigmoid(z) print(round(out, 3)) # one neuron's output
A whole layer = a matrix multiply
# 3 inputs → a hidden layer of 4 neurons W = np.random.randn(3, 4) # weight matrix (inputs × neurons) b = np.random.randn(4) # one bias per neuron hidden = sigmoid(x @ W + b) # shape (4,) — four neuron outputs at once
Layers are just matrix maths — which is why NumPy (Lesson 4) was the prerequisite, and why GPUs (great at matrix multiply) made deep learning fast.
Why non-linearity matters
Without an activation, stacking layers collapses to one big linear function — no more powerful than logistic regression. The squash (sigmoid, ReLU) is what lets the network bend, curve, and carve complex shapes.
Depth vs width
wider layer = more neurons per layer → captures more patterns at that level deeper net = more layers → builds more abstract features Both add capacity — and both can overfit (Lesson 25).
Worked Example · A 2-Layer Forward Pass
12 min# tiny_net.py — forward pass of a 2-layer net, by hand import numpy as np def sigmoid(z): return 1 / (1 + np.exp(-z)) def relu(z): return np.maximum(0, z) np.random.seed(0) x = np.array([0.6, 0.1, 0.9, 0.4]) # 4 features (one sample) # layer 1: 4 inputs → 5 hidden neurons (ReLU) W1 = np.random.randn(4, 5) * 0.5 b1 = np.zeros(5) h = relu(x @ W1 + b1) # layer 2: 5 hidden → 1 output (sigmoid = probability) W2 = np.random.randn(5, 1) * 0.5 b2 = np.zeros(1) out = sigmoid(h @ W2 + b2) print("hidden activations:", h.round(2)) print("output (probability):", out.round(3))
Sample output
hidden activations: [0. 0.41 0. 0.28 0.67] output (probability): [0.508]
Read the diff
Data flows forward: features → hidden (ReLU zeros out negatives) → output (sigmoid → probability). The weights here are random, so the answer is meaningless — training (Lesson 24) is the process of nudging W1, b1, W2, b2 until the output matches reality. You just saw what a network computes; next we let Keras do the bookkeeping.
Try It Yourself
13 minCode a single sigmoid neuron with 2 inputs. Try inputs/weights that make it output close to 1, and close to 0.
Run a hidden layer on a BATCH of 10 samples at once (shape 10×4). Confirm the output is shape 10×5.
Hint
X = np.random.randn(10, 4) H = relu(X @ W1 + b1) # broadcasting handles the batch print(H.shape) # (10, 5)
For a net with layers 4 → 8 → 8 → 3, count the total learnable parameters (weights + biases). Write a formula.
Hint
def params(layers): total = 0 for a, b in zip(layers[:-1], layers[1:]): total += a * b + b # weights + biases return total print(params([4, 8, 8, 3])) # 4*8+8 + 8*8+8 + 8*3+3 = 139
Mini-Challenge · The XOR Problem
8 minXOR (exclusive or) can't be solved by a single neuron / straight line — but a 2-neuron hidden layer can. Hand-pick weights for a tiny net that computes XOR for inputs (0,0)(0,1)(1,0)(1,1) → (0,1,1,0). It proves why hidden layers matter.
Show one possible solution
import numpy as np def step(z): return (z > 0).astype(float) # Hidden neuron 1 = OR, hidden neuron 2 = AND, output = OR and not AND W1 = np.array([[1, 1], [1, 1]]); b1 = np.array([-0.5, -1.5]) # OR, AND W2 = np.array([[1], [-2]]); b2 = np.array([-0.5]) for x in [[0,0],[0,1],[1,0],[1,1]]: h = step(np.array(x) @ W1 + b1) out = step(h @ W2 + b2) print(x, "->", int(out[0]))
[0, 0] -> 0 [0, 1] -> 1 [1, 0] -> 1 [1, 1] -> 0
Non-negotiables: a hidden layer of 2 neurons, correct XOR truth table. This is the historical example that revived neural nets — a single layer literally cannot do it.
Recap
3 minA neuron = weighted sum + bias + activation. A layer = a matrix multiply over many neurons. A network stacks layers (input, hidden, output). Non-linear activations let depth build abstract features no line could. One sigmoid neuron equals logistic regression; hidden layers are what make networks powerful. Next: which activations to use.
Vocabulary Card
- neuron
- Computes activation(inputs · weights + bias).
- weight / bias
- The learnable numbers — weights scale inputs, bias shifts the threshold.
- hidden layer
- A layer between input and output that builds intermediate features.
- forward pass
- Computing the output by flowing data layer by layer.
Homework
4 minBuild a NumPy forward(X, layers) that runs a forward pass for any list of (W, b) layer tuples with ReLU hidden + sigmoid output. Test it on a 4→5→1 net with random weights and a batch of inputs. Print output shape and a couple of values.
import numpy as np def sigmoid(z): return 1/(1+np.exp(-z)) def relu(z): return np.maximum(0, z) def forward(X, layers): a = X for i, (W, b) in enumerate(layers): z = a @ W + b a = sigmoid(z) if i == len(layers)-1 else relu(z) return a X = np.random.randn(6, 4) layers = [(np.random.randn(4,5)*.3, np.zeros(5)), (np.random.randn(5,1)*.3, np.zeros(1))] out = forward(X, layers) print(out.shape, out[:2].round(3))