PY-L5-21 · Neural Networks 101 — Neurons & Layers

Learning Goals

3 min

Describe a neuron: inputs × weights + bias → activation.
Understand layers: input, hidden, output.
See why a single neuron is just logistic regression — and why depth adds power.
Code one neuron's forward pass in NumPy.

Warm-Up · What a Neuron Does

5 min

inputs    x1  x2  x3
weights   w1  w2  w3      (learned)
bias      b               (learned)

z = x1·w1 + x2·w2 + x3·w3 + b     ← weighted sum
output = activation(z)            ← squash (e.g. sigmoid, ReLU)

That's the whole neuron. The weights say "how much does each input matter"; the bias shifts the threshold; the activation adds non-linearity. Learning = adjusting the weights and bias to reduce error.

Today's big idea

One neuron with a sigmoid IS logistic regression. The magic of deep learning is stacking: hidden layers let later neurons combine earlier features into more abstract ones — edges → shapes → objects.

New Concept · Neurons & Layers

14 min

A network is layers of neurons

INPUT layer    one node per feature (e.g. 4 for iris)
HIDDEN layer   neurons that combine inputs (you choose how many)
OUTPUT layer   one node per class (or 1 for regression / binary)

[x1]\
[x2]──[h1]\
[x3]──[h2]──[out]
[x4]──[h3]/

One neuron in NumPy

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

x = np.array([0.5, 0.9, 0.1])      # inputs
w = np.array([0.4, -0.2, 0.7])     # weights (would be learned)
b = 0.1                            # bias

z = x @ w + b                      # weighted sum
out = sigmoid(z)
print(round(out, 3))               # one neuron's output

A whole layer = a matrix multiply

# 3 inputs → a hidden layer of 4 neurons
W = np.random.randn(3, 4)   # weight matrix (inputs × neurons)
b = np.random.randn(4)      # one bias per neuron

hidden = sigmoid(x @ W + b) # shape (4,) — four neuron outputs at once

Layers are just matrix maths — which is why NumPy (Lesson 4) was the prerequisite, and why GPUs (great at matrix multiply) made deep learning fast.

Why non-linearity matters

Without an activation, stacking layers collapses to one big linear function — no more powerful than logistic regression. The squash (sigmoid, ReLU) is what lets the network bend, curve, and carve complex shapes.

Depth vs width

wider layer  = more neurons per layer → captures more patterns at that level
deeper net   = more layers → builds more abstract features
Both add capacity — and both can overfit (Lesson 25).

Worked Example · A 2-Layer Forward Pass

12 min

# tiny_net.py — forward pass of a 2-layer net, by hand
import numpy as np

def sigmoid(z): return 1 / (1 + np.exp(-z))
def relu(z):    return np.maximum(0, z)

np.random.seed(0)
x = np.array([0.6, 0.1, 0.9, 0.4])     # 4 features (one sample)

# layer 1: 4 inputs → 5 hidden neurons (ReLU)
W1 = np.random.randn(4, 5) * 0.5
b1 = np.zeros(5)
h  = relu(x @ W1 + b1)

# layer 2: 5 hidden → 1 output (sigmoid = probability)
W2 = np.random.randn(5, 1) * 0.5
b2 = np.zeros(1)
out = sigmoid(h @ W2 + b2)

print("hidden activations:", h.round(2))
print("output (probability):", out.round(3))

Sample output

hidden activations: [0.   0.41 0.   0.28 0.67]
output (probability): [0.508]

Read the diff

Data flows forward: features → hidden (ReLU zeros out negatives) → output (sigmoid → probability). The weights here are random, so the answer is meaningless — training (Lesson 24) is the process of nudging W1, b1, W2, b2 until the output matches reality. You just saw what a network computes; next we let Keras do the bookkeeping.

Try It Yourself

13 min

01 🟢 One neuron

Code a single sigmoid neuron with 2 inputs. Try inputs/weights that make it output close to 1, and close to 0.

02 🟡 A layer for a batch

Run a hidden layer on a BATCH of 10 samples at once (shape 10×4). Confirm the output is shape 10×5.

Hint

X = np.random.randn(10, 4)
H = relu(X @ W1 + b1)   # broadcasting handles the batch
print(H.shape)          # (10, 5)

03 🔴 Count the parameters

For a net with layers 4 → 8 → 8 → 3, count the total learnable parameters (weights + biases). Write a formula.

Hint

def params(layers):
    total = 0
    for a, b in zip(layers[:-1], layers[1:]):
        total += a * b + b      # weights + biases
    return total
print(params([4, 8, 8, 3]))     # 4*8+8 + 8*8+8 + 8*3+3 = 139

Mini-Challenge · The XOR Problem

8 min

XOR (exclusive or) can't be solved by a single neuron / straight line — but a 2-neuron hidden layer can. Hand-pick weights for a tiny net that computes XOR for inputs (0,0)(0,1)(1,0)(1,1) → (0,1,1,0). It proves why hidden layers matter.

Show one possible solution

import numpy as np
def step(z): return (z > 0).astype(float)

# Hidden neuron 1 = OR, hidden neuron 2 = AND, output = OR and not AND
W1 = np.array([[1, 1], [1, 1]]); b1 = np.array([-0.5, -1.5])  # OR, AND
W2 = np.array([[1], [-2]]);      b2 = np.array([-0.5])

for x in [[0,0],[0,1],[1,0],[1,1]]:
    h = step(np.array(x) @ W1 + b1)
    out = step(h @ W2 + b2)
    print(x, "->", int(out[0]))

[0, 0] -> 0
[0, 1] -> 1
[1, 0] -> 1
[1, 1] -> 0

Non-negotiables: a hidden layer of 2 neurons, correct XOR truth table. This is the historical example that revived neural nets — a single layer literally cannot do it.

Recap

3 min

A neuron = weighted sum + bias + activation. A layer = a matrix multiply over many neurons. A network stacks layers (input, hidden, output). Non-linear activations let depth build abstract features no line could. One sigmoid neuron equals logistic regression; hidden layers are what make networks powerful. Next: which activations to use.

Vocabulary Card

neuron: Computes activation(inputs · weights + bias).
weight / bias: The learnable numbers — weights scale inputs, bias shifts the threshold.
hidden layer: A layer between input and output that builds intermediate features.
forward pass: Computing the output by flowing data layer by layer.

Homework

4 min

Build a NumPy forward(X, layers) that runs a forward pass for any list of (W, b) layer tuples with ReLU hidden + sigmoid output. Test it on a 4→5→1 net with random weights and a batch of inputs. Print output shape and a couple of values.

import numpy as np
def sigmoid(z): return 1/(1+np.exp(-z))
def relu(z):    return np.maximum(0, z)

def forward(X, layers):
    a = X
    for i, (W, b) in enumerate(layers):
        z = a @ W + b
        a = sigmoid(z) if i == len(layers)-1 else relu(z)
    return a

X = np.random.randn(6, 4)
layers = [(np.random.randn(4,5)*.3, np.zeros(5)),
          (np.random.randn(5,1)*.3, np.zeros(1))]
out = forward(X, layers)
print(out.shape, out[:2].round(3))

inputs x1 x2 x3 weights w1 w2 w3 (learned) bias b (learned) z = x1·w1 + x2·w2 + x3·w3 + b ← weighted sum output = activation(z) ← squash (e.g. sigmoid, ReLU)

INPUT layer one node per feature (e.g. 4 for iris) HIDDEN layer neurons that combine inputs (you choose how many) OUTPUT layer one node per class (or 1 for regression / binary) [x1]\ [x2]──[h1]\ [x3]──[h2]──[out] [x4]──[h3]/

import numpy as np def sigmoid(z): return 1 / (1 + np.exp(-z)) x = np.array([0.5, 0.9, 0.1]) # inputs w = np.array([0.4, -0.2, 0.7]) # weights (would be learned) b = 0.1 # bias z = x @ w + b # weighted sum out = sigmoid(z) print(round(out, 3)) # one neuron's output

# 3 inputs → a hidden layer of 4 neurons W = np.random.randn(3, 4) # weight matrix (inputs × neurons) b = np.random.randn(4) # one bias per neuron hidden = sigmoid(x @ W + b) # shape (4,) — four neuron outputs at once

# tiny_net.py — forward pass of a 2-layer net, by hand import numpy as np def sigmoid(z): return 1 / (1 + np.exp(-z)) def relu(z): return np.maximum(0, z) np.random.seed(0) x = np.array([0.6, 0.1, 0.9, 0.4]) # 4 features (one sample) # layer 1: 4 inputs → 5 hidden neurons (ReLU) W1 = np.random.randn(4, 5) * 0.5 b1 = np.zeros(5) h = relu(x @ W1 + b1) # layer 2: 5 hidden → 1 output (sigmoid = probability) W2 = np.random.randn(5, 1) * 0.5 b2 = np.zeros(1) out = sigmoid(h @ W2 + b2) print("hidden activations:", h.round(2)) print("output (probability):", out.round(3))

def params(layers): total = 0 for a, b in zip(layers[:-1], layers[1:]): total += a * b + b # weights + biases return total print(params([4, 8, 8, 3])) # 4*8+8 + 8*8+8 + 8*3+3 = 139

import numpy as np def step(z): return (z > 0).astype(float) # Hidden neuron 1 = OR, hidden neuron 2 = AND, output = OR and not AND W1 = np.array([[1, 1], [1, 1]]); b1 = np.array([-0.5, -1.5]) # OR, AND W2 = np.array([[1], [-2]]); b2 = np.array([-0.5]) for x in [[0,0],[0,1],[1,0],[1,1]]: h = step(np.array(x) @ W1 + b1) out = step(h @ W2 + b2) print(x, "->", int(out[0]))