Learning Goals
3 minBy the end of this lesson you can:
- Implement insertion sort with an outer loop + inner shifting loop.
- Trace through the "take a card, slot it in" mental model.
- Recognise that insertion sort is O(n²) worst case but O(n) best case (nearly-sorted input).
- Know that Python's Timsort uses insertion sort internally for small chunks — proving its worth.
Warm-Up · The Card-Hand Mental Model
5 minStart: [5, 3, 8, 1, 4]
↑ first card is "sorted" by itself
Take 3: compare to 5; smaller; slot before → [3, 5, 8, 1, 4]
Take 8: compare to 5; bigger; stays → [3, 5, 8, 1, 4]
Take 1: compare to 8 (smaller), 5 (smaller),
3 (smaller), goes at the start → [1, 3, 5, 8, 4]
Take 4: compare to 8 (smaller), 5 (smaller),
3 (bigger), slot after 3 → [1, 3, 4, 5, 8]You've done this with a real deck of cards. The left side of your hand stays sorted as you go.
Insertion sort builds the sorted portion left-to-right. Each new item is slotted into its correct position by shifting bigger items right. Natural for humans; efficient on small or nearly-sorted lists.
New Concept · The Shift-and-Slot
14 minThe shape
def insertion_sort(lst): for i in range(1, len(lst)): current = lst[i] # the card we're slotting j = i - 1 # Shift larger elements right while j >= 0 and lst[j] > current: lst[j + 1] = lst[j] j -= 1 lst[j + 1] = current # slot the card in data = [5, 3, 8, 1, 4] insertion_sort(data) print(data) # [1, 3, 4, 5, 8]
Five lines inside the for-loop. current holds the card we're placing. The inner while shifts bigger neighbours right. lst[j + 1] = current drops the card into its slot.
Why it's faster than bubble in practice
Bubble does a swap (3 operations) every time it finds an out-of-order pair. Insertion does only one comparison and a shift (1 operation) per misplaced item. Both are O(n²) — but the constant factor for insertion is much smaller.
The best case · already sorted
If the list is already sorted, the inner while never fires (because lst[j] < current immediately). The outer loop runs N-1 times with constant work each. O(n) — linear time.
# Already sorted — insertion sort is O(n) data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] insertion_sort(data) # nearly instant — one comparison per item
Bubble sort, even with the early-exit optimisation, would still do one full pass (O(n) comparisons) before knowing. Insertion is faster in this case.
The worst case · reverse-sorted
Every new item has to slide all the way to the start. O(n²).
Stable sort
Insertion sort is stable — equal elements keep their original relative order. We only shift when lst[j] > current (strict). On equal, we stop. This matters when sorting by a secondary key and you want to preserve a primary order.
students = [("Aisyah", "A"), ("Wei Jie", "B"), ("Priya", "A"), ("Iman", "B")] # Sort by grade — stable sort keeps name order within each grade insertion_sort_key(students, key=lambda s: s[1]) # Stable result: [("Aisyah", "A"), ("Priya", "A"), ("Wei Jie", "B"), ("Iman", "B")]
Where insertion sort still wins today
✓ Small lists (≤ ~20 items) fastest in practice ✓ Almost-sorted lists near O(n) ✓ Streaming data — items arrive one insert as they come at a time ✓ As a sub-routine inside Timsort Python uses it for runs ≤ 64
Timsort (Python's built-in) actually uses insertion sort internally. When it spots a small run (≤ 64 items), it calls insertion sort because it's genuinely faster than mergesort or quicksort at that scale.
Worked Example · Visualised Insertion
12 minSave as insertion_visual.py:
# insertion_visual.py — print every step def insertion_sort_visual(lst): print(f"Start: {lst}") for i in range(1, len(lst)): current = lst[i] j = i - 1 while j >= 0 and lst[j] > current: lst[j + 1] = lst[j] j -= 1 lst[j + 1] = current sorted_part = lst[:i + 1] unsorted_part = lst[i + 1:] # Visual separator between sorted (left) and unsorted (right) joined = " ".join(str(x) for x in sorted_part) + " | " + \ " ".join(str(x) for x in unsorted_part) print(f"Step {i}: take {current}, place → [{joined}]") insertion_sort_visual([5, 1, 4, 2, 8, 3])
Output
Start: [5, 1, 4, 2, 8, 3] Step 1: take 1, place → [1 5 | 4 2 8 3] Step 2: take 4, place → [1 4 5 | 2 8 3] Step 3: take 2, place → [1 2 4 5 | 8 3] Step 4: take 8, place → [1 2 4 5 8 | 3] Step 5: take 3, place → [1 2 3 4 5 8 | ]
Read the diff
After each step, the left of | is sorted; the right is still raw. The boundary moves right by one each pass. Five passes for six items.
Try It Yourself
13 minImplement the algorithm. Test on three lists: random, already-sorted, reverse-sorted. Run each through and time them. Confirm:
- Already sorted = fast.
- Random = slower.
- Reverse sorted = slowest.
Generalise to accept a key function. Test on a list of tuples.
Hint
def insertion_sort_key(lst, key=lambda x: x): for i in range(1, len(lst)): current = lst[i] cur_key = key(current) j = i - 1 while j >= 0 and key(lst[j]) > cur_key: lst[j + 1] = lst[j] j -= 1 lst[j + 1] = current data = [("apple", 3), ("cherry", 1), ("banana", 2)] insertion_sort_key(data, key=lambda p: p[1]) print(data) # [('cherry', 1), ('banana', 2), ('apple', 3)]
Same shape as PY-L3-39's key-based bubble. Cache the key for current to avoid recomputing.
Implement insertion sort recursively. The recursive case sorts lst[:n-1], then inserts lst[n-1] in its place.
Hint
def insertion_sort_rec(lst, n=None): if n is None: n = len(lst) if n <= 1: return insertion_sort_rec(lst, n - 1) # sort first n-1 items # Insert lst[n-1] into the sorted prefix last = lst[n - 1] j = n - 2 while j >= 0 and lst[j] > last: lst[j + 1] = lst[j] j -= 1 lst[j + 1] = last data = [5, 1, 4, 2, 8, 3] insertion_sort_rec(data) print(data) # [1, 2, 3, 4, 5, 8]
Same complexity as iterative; great proof-of-concept for the "recursive trust" idea. Beware Python's recursion limit on big lists.
Mini-Challenge · Best vs Worst Case
8 minTime insertion sort on three 5000-element lists:
- Already sorted:
list(range(5000)) - Random:
random.sample(range(5000), 5000) - Reverse sorted:
list(range(5000, 0, -1))
Confirm best-case ratio is dramatically faster than worst-case.
Show one possible solution
# insertion_cases.py — time the three cases import random, time def insertion_sort(lst): for i in range(1, len(lst)): cur = lst[i] j = i - 1 while j >= 0 and lst[j] > cur: lst[j + 1] = lst[j] j -= 1 lst[j + 1] = cur for label, data in [ ("Sorted", list(range(5000))), ("Random", random.sample(range(5000), 5000)), ("Reversed", list(range(5000, 0, -1))), ]: work = list(data) start = time.perf_counter() insertion_sort(work) elapsed = (time.perf_counter() - start) * 1000 print(f" {label:<10} {elapsed:>8.2f} ms")
Non-negotiables: three test cases, each 5000 items, timing reported. Best-case (already sorted) should be 50-100× faster than worst-case (reverse sorted). Empirically confirms O(n) vs O(n²).
Recap
3 minInsertion sort takes each item one at a time and shifts it left into the sorted portion. O(n²) worst case, O(n) best case (already sorted). Stable — equal elements keep order. Faster in practice than bubble sort. Still used inside Python's Timsort for small runs. For real sorting work in production: never write your own — use sorted(). We'll meet it tomorrow.
Vocabulary Card
- insertion sort
- Build sorted prefix left-to-right. Each new item shifts into place.
- sorted prefix
- The left portion of the list that's already in order.
- stable sort
- Preserves the relative order of equal elements. Important for multi-key sorts.
- best case O(n)
- When data is already sorted, insertion sort just makes one pass and exits.
Homework
4 minBuild sort_compare.py. Compare bubble, insertion, and Python's built-in sorted on the same 10000-element random list. Time each. Print results.
Sample · sort_compare.py
import random, time def bubble_sort(lst): n = len(lst) for i in range(n - 1): swapped = False for j in range(n - 1 - i): if lst[j] > lst[j + 1]: lst[j], lst[j + 1] = lst[j + 1], lst[j] swapped = True if not swapped: return def insertion_sort(lst): for i in range(1, len(lst)): cur = lst[i]; j = i - 1 while j >= 0 and lst[j] > cur: lst[j + 1] = lst[j]; j -= 1 lst[j + 1] = cur N = 10_000 random.seed(0) master = [random.randint(0, 100_000) for _ in range(N)] for name, fn in [("bubble", bubble_sort), ("insertion", insertion_sort)]: work = list(master) start = time.perf_counter() fn(work) elapsed = (time.perf_counter() - start) * 1000 print(f" {name:<10} {elapsed:>10.2f} ms") # Built-in work = list(master) start = time.perf_counter() work.sort() elapsed = (time.perf_counter() - start) * 1000 print(f" {'builtin':<10} {elapsed:>10.2f} ms")
Non-negotiables: 10k random items, three sorts on identical copies, time each. Expect: bubble ~6-10s, insertion ~3-5s, built-in 5-10 ms. The built-in is 500-1000× faster — that's tomorrow's lesson.