Learning Goals
3 min- Create arrays with
np.array,np.zeros,np.arange,np.linspace. - Do vectorised maths — operate on the whole array, no loops.
- Use broadcasting, slicing, boolean masks, and
reshape. - Compute axis-wise stats (
sum,meanalong rows / columns).
Warm-Up · Loop vs Vector
5 min# The Python way — a loop nums = [1, 2, 3, 4] doubled = [n * 2 for n in nums] # [2, 4, 6, 8] # The NumPy way — vectorised, no loop import numpy as np arr = np.array([1, 2, 3, 4]) doubled = arr * 2 # array([2, 4, 6, 8])
Same result. But NumPy does it in optimised C under the hood — for a million numbers it's dramatically faster, and the code is shorter.
Stop writing loops over numbers. Express the operation on the whole array. This is "vectorisation" — the mental shift that makes ML code fast and readable.
New Concept · The ndarray
14 minCreating arrays
import numpy as np np.array([1, 2, 3]) # from a list np.zeros((2, 3)) # 2×3 of zeros np.ones(4) # [1. 1. 1. 1.] np.arange(0, 10, 2) # [0 2 4 6 8] np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1. ] np.random.rand(3) # 3 random floats in [0,1)
Shape, dtype, ndim
a = np.array([[1, 2, 3], [4, 5, 6]]) print(a.shape) # (2, 3) print(a.ndim) # 2 print(a.dtype) # int64
Vectorised maths & broadcasting
a = np.array([1, 2, 3, 4]) print(a + 10) # [11 12 13 14] scalar broadcasts print(a ** 2) # [ 1 4 9 16] print(a + a) # [2 4 6 8] element-wise b = np.array([[1, 2, 3], [4, 5, 6]]) print(b + np.array([10, 20, 30])) # row broadcasts to every row
Slicing & boolean masks
a = np.array([10, 20, 30, 40, 50]) print(a[1:4]) # [20 30 40] print(a[a > 25]) # [30 40 50] boolean mask a[a > 25] = 0 # set matching elements print(a) # [10 20 0 0 0]
Reshape & axes
a = np.arange(6) # [0 1 2 3 4 5] m = a.reshape(2, 3) # [[0 1 2] # [3 4 5]] print(m.sum()) # 15 (everything) print(m.sum(axis=0)) # [3 5 7] down columns print(m.sum(axis=1)) # [3 12] across rows print(m.mean(axis=0)) # column means
axis=0 collapses rows (gives a per-column result); axis=1 collapses columns (per-row). Memorise this — it confuses everyone at first.
Worked Example · Normalise Features
12 minA core ML chore: scale each feature column to mean 0, std 1. With NumPy it's two lines.
import numpy as np # 5 samples, 3 features X = np.array([ [180, 80, 25], [165, 60, 30], [175, 72, 22], [190, 95, 40], [160, 55, 28], ], dtype=float) # Per-column mean and std (axis=0 = down the rows) mu = X.mean(axis=0) sigma = X.std(axis=0) X_scaled = (X - mu) / sigma # broadcasting does it all print("means after scaling :", X_scaled.mean(axis=0).round(2)) print("stds after scaling :", X_scaled.std(axis=0).round(2)) print(X_scaled.round(2))
Sample output
means after scaling : [ 0. -0. 0.] stds after scaling : [1. 1. 1.] [[ 0.5 0.46 -0.34] [-0.93 -0.92 0.43] [-0. 0. -0.81] [ 1.43 1.6 1.97] [-1. -1.15 -0.04]]
Read the diff
No loops. X - mu subtracts each column's mean from every value in that column (broadcasting); dividing by sigma scales it. This is literally what scikit-learn's StandardScaler does internally — you just wrote it.
Try It Yourself
13 minMake an array of the numbers 1-10. Print: the array squared, the sum, the mean, and only the even numbers.
Hint
a = np.arange(1, 11) print(a ** 2, a.sum(), a.mean(), a[a % 2 == 0])
Make a 4×3 array of random integers 0-9. Print each row's sum and each column's max.
Hint
m = np.random.randint(0, 10, size=(4, 3)) print(m) print("row sums :", m.sum(axis=1)) print("col maxes:", m.max(axis=0))
Scale each column of a feature matrix to the range [0, 1] using (X - X.min(0)) / (X.max(0) - X.min(0)). Verify every column's min is 0 and max is 1.
Hint
lo, hi = X.min(axis=0), X.max(axis=0) Xn = (X - lo) / (hi - lo) print(Xn.min(axis=0), Xn.max(axis=0)) # [0 0 0] [1 1 1]
Mini-Challenge · Distance Without Loops
8 minGiven one point and an array of other points, compute the Euclidean distance from the point to every other — vectorised, no Python loop. (This is the heart of K-Nearest Neighbours, Lesson 12.)
Show one possible solution
import numpy as np point = np.array([3.0, 4.0]) others = np.array([[0, 0], [3, 0], [6, 8], [3, 5]], dtype=float) # (others - point) broadcasts; square, sum across columns, sqrt diffs = others - point dists = np.sqrt((diffs ** 2).sum(axis=1)) print(dists.round(2)) # [5. 4. 5.66 1. ] nearest = others[dists.argmin()] print("nearest point:", nearest) # [3. 5.]
Non-negotiables: no Python loop, use broadcasting + axis=1 sum, find the closest with argmin. You just wrote the core of KNN.
Recap
3 minNumPy's ndarray does maths on whole arrays at once — vectorisation — which is fast and concise. Broadcasting lets a scalar or row apply across a whole array. Slicing and boolean masks select elements; reshape changes dimensions; axis=0/axis=1 control whether stats run down columns or across rows. Every ML library speaks NumPy.
Vocabulary Card
- ndarray
- NumPy's N-dimensional array — the core data type for numeric computing.
- vectorisation
- Applying an operation to a whole array at once, instead of looping element by element.
- broadcasting
- NumPy stretching a smaller array to match a larger one's shape during maths.
- axis
- Which dimension to operate along. axis=0 = down rows (per-column), axis=1 = across columns (per-row).
Homework
4 minWrite numpy_drills.py covering: array creation 3 ways, vectorised maths, a boolean-mask filter, a reshape, and axis-wise stats. Add a timing comparison: square a million numbers with a Python loop vs NumPy, print both times.
Sample · the timing part
import numpy as np, time n = 1_000_000 pylist = list(range(n)) t0 = time.time() sq_loop = [x * x for x in pylist] t1 = time.time() arr = np.arange(n) t2 = time.time() sq_np = arr ** 2 t3 = time.time() print(f"python loop: {t1 - t0:.3f}s") print(f"numpy : {t3 - t2:.4f}s") print(f"speedup : {(t1 - t0) / (t3 - t2):.0f}x")
Non-negotiables: all five drills + a real timing comparison showing NumPy's speedup.