PY-L3-40 · Sorting — Insertion Sort

Learning Goals

3 min

By the end of this lesson you can:

Implement insertion sort with an outer loop + inner shifting loop.
Trace through the "take a card, slot it in" mental model.
Recognise that insertion sort is O(n²) worst case but O(n) best case (nearly-sorted input).
Know that Python's Timsort uses insertion sort internally for small chunks — proving its worth.

Warm-Up · The Card-Hand Mental Model

5 min

Start:    [5, 3, 8, 1, 4]
           ↑ first card is "sorted" by itself

Take 3:   compare to 5; smaller; slot before → [3, 5, 8, 1, 4]

Take 8:   compare to 5; bigger; stays         → [3, 5, 8, 1, 4]

Take 1:   compare to 8 (smaller), 5 (smaller),
          3 (smaller), goes at the start       → [1, 3, 5, 8, 4]

Take 4:   compare to 8 (smaller), 5 (smaller),
          3 (bigger), slot after 3             → [1, 3, 4, 5, 8]

You've done this with a real deck of cards. The left side of your hand stays sorted as you go.

Today's big idea

Insertion sort builds the sorted portion left-to-right. Each new item is slotted into its correct position by shifting bigger items right. Natural for humans; efficient on small or nearly-sorted lists.

New Concept · The Shift-and-Slot

14 min

The shape

def insertion_sort(lst):
    for i in range(1, len(lst)):
        current = lst[i]                  # the card we're slotting
        j = i - 1
        # Shift larger elements right
        while j >= 0 and lst[j] > current:
            lst[j + 1] = lst[j]
            j -= 1
        lst[j + 1] = current               # slot the card in

data = [5, 3, 8, 1, 4]
insertion_sort(data)
print(data)        # [1, 3, 4, 5, 8]

Five lines inside the for-loop. current holds the card we're placing. The inner while shifts bigger neighbours right. lst[j + 1] = current drops the card into its slot.

Why it's faster than bubble in practice

Bubble does a swap (3 operations) every time it finds an out-of-order pair. Insertion does only one comparison and a shift (1 operation) per misplaced item. Both are O(n²) — but the constant factor for insertion is much smaller.

The best case · already sorted

If the list is already sorted, the inner while never fires (because lst[j] < current immediately). The outer loop runs N-1 times with constant work each. O(n) — linear time.

# Already sorted — insertion sort is O(n)
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
insertion_sort(data)     # nearly instant — one comparison per item

Bubble sort, even with the early-exit optimisation, would still do one full pass (O(n) comparisons) before knowing. Insertion is faster in this case.

The worst case · reverse-sorted

Every new item has to slide all the way to the start. O(n²).

Stable sort

Insertion sort is stable — equal elements keep their original relative order. We only shift when lst[j] > current (strict). On equal, we stop. This matters when sorting by a secondary key and you want to preserve a primary order.

students = [("Aisyah", "A"), ("Wei Jie", "B"), ("Priya", "A"), ("Iman", "B")]
# Sort by grade — stable sort keeps name order within each grade
insertion_sort_key(students, key=lambda s: s[1])
# Stable result: [("Aisyah", "A"), ("Priya", "A"), ("Wei Jie", "B"), ("Iman", "B")]

Where insertion sort still wins today

✓ Small lists (≤ ~20 items)              fastest in practice
✓ Almost-sorted lists                     near O(n)
✓ Streaming data — items arrive one       insert as they come
   at a time
✓ As a sub-routine inside Timsort         Python uses it for runs ≤ 64

Timsort (Python's built-in) actually uses insertion sort internally. When it spots a small run (≤ 64 items), it calls insertion sort because it's genuinely faster than mergesort or quicksort at that scale.

Worked Example · Visualised Insertion

12 min

Save as insertion_visual.py:

# insertion_visual.py — print every step

def insertion_sort_visual(lst):
    print(f"Start: {lst}")
    for i in range(1, len(lst)):
        current = lst[i]
        j = i - 1
        while j >= 0 and lst[j] > current:
            lst[j + 1] = lst[j]
            j -= 1
        lst[j + 1] = current
        sorted_part = lst[:i + 1]
        unsorted_part = lst[i + 1:]
        # Visual separator between sorted (left) and unsorted (right)
        joined = " ".join(str(x) for x in sorted_part) + " | " + \
                 " ".join(str(x) for x in unsorted_part)
        print(f"Step {i}: take {current}, place →  [{joined}]")


insertion_sort_visual([5, 1, 4, 2, 8, 3])

Output

Start: [5, 1, 4, 2, 8, 3]
Step 1: take 1, place →  [1 5 | 4 2 8 3]
Step 2: take 4, place →  [1 4 5 | 2 8 3]
Step 3: take 2, place →  [1 2 4 5 | 8 3]
Step 4: take 8, place →  [1 2 4 5 8 | 3]
Step 5: take 3, place →  [1 2 3 4 5 8 | ]

Read the diff

After each step, the left of | is sorted; the right is still raw. The boundary moves right by one each pass. Five passes for six items.

Try It Yourself

13 min

01 🟢 Standard insertion sort

Implement the algorithm. Test on three lists: random, already-sorted, reverse-sorted. Run each through and time them. Confirm:

Already sorted = fast.
Random = slower.
Reverse sorted = slowest.

02 🟡 Key-based

Generalise to accept a key function. Test on a list of tuples.

Hint

def insertion_sort_key(lst, key=lambda x: x):
    for i in range(1, len(lst)):
        current = lst[i]
        cur_key = key(current)
        j = i - 1
        while j >= 0 and key(lst[j]) > cur_key:
            lst[j + 1] = lst[j]
            j -= 1
        lst[j + 1] = current

data = [("apple", 3), ("cherry", 1), ("banana", 2)]
insertion_sort_key(data, key=lambda p: p[1])
print(data)
# [('cherry', 1), ('banana', 2), ('apple', 3)]

Same shape as PY-L3-39's key-based bubble. Cache the key for current to avoid recomputing.

03 🔴 Recursive insertion sort (stretch)

Implement insertion sort recursively. The recursive case sorts lst[:n-1], then inserts lst[n-1] in its place.

Hint

def insertion_sort_rec(lst, n=None):
    if n is None:
        n = len(lst)
    if n <= 1:
        return
    insertion_sort_rec(lst, n - 1)        # sort first n-1 items
    # Insert lst[n-1] into the sorted prefix
    last = lst[n - 1]
    j = n - 2
    while j >= 0 and lst[j] > last:
        lst[j + 1] = lst[j]
        j -= 1
    lst[j + 1] = last

data = [5, 1, 4, 2, 8, 3]
insertion_sort_rec(data)
print(data)        # [1, 2, 3, 4, 5, 8]

Same complexity as iterative; great proof-of-concept for the "recursive trust" idea. Beware Python's recursion limit on big lists.

Mini-Challenge · Best vs Worst Case

8 min

Time insertion sort on three 5000-element lists:

Already sorted: list(range(5000))
Random: random.sample(range(5000), 5000)
Reverse sorted: list(range(5000, 0, -1))

Confirm best-case ratio is dramatically faster than worst-case.

Show one possible solution

# insertion_cases.py — time the three cases

import random, time

def insertion_sort(lst):
    for i in range(1, len(lst)):
        cur = lst[i]
        j = i - 1
        while j >= 0 and lst[j] > cur:
            lst[j + 1] = lst[j]
            j -= 1
        lst[j + 1] = cur


for label, data in [
    ("Sorted",     list(range(5000))),
    ("Random",     random.sample(range(5000), 5000)),
    ("Reversed",   list(range(5000, 0, -1))),
]:
    work = list(data)
    start = time.perf_counter()
    insertion_sort(work)
    elapsed = (time.perf_counter() - start) * 1000
    print(f"  {label:<10} {elapsed:>8.2f} ms")

Non-negotiables: three test cases, each 5000 items, timing reported. Best-case (already sorted) should be 50-100× faster than worst-case (reverse sorted). Empirically confirms O(n) vs O(n²).

Recap

3 min

Insertion sort takes each item one at a time and shifts it left into the sorted portion. O(n²) worst case, O(n) best case (already sorted). Stable — equal elements keep order. Faster in practice than bubble sort. Still used inside Python's Timsort for small runs. For real sorting work in production: never write your own — use sorted(). We'll meet it tomorrow.

Vocabulary Card

insertion sort: Build sorted prefix left-to-right. Each new item shifts into place.
sorted prefix: The left portion of the list that's already in order.
stable sort: Preserves the relative order of equal elements. Important for multi-key sorts.
best case O(n): When data is already sorted, insertion sort just makes one pass and exits.

Homework

4 min

Build sort_compare.py. Compare bubble, insertion, and Python's built-in sorted on the same 10000-element random list. Time each. Print results.

Sample · sort_compare.py

import random, time

def bubble_sort(lst):
    n = len(lst)
    for i in range(n - 1):
        swapped = False
        for j in range(n - 1 - i):
            if lst[j] > lst[j + 1]:
                lst[j], lst[j + 1] = lst[j + 1], lst[j]
                swapped = True
        if not swapped: return

def insertion_sort(lst):
    for i in range(1, len(lst)):
        cur = lst[i]; j = i - 1
        while j >= 0 and lst[j] > cur:
            lst[j + 1] = lst[j]; j -= 1
        lst[j + 1] = cur


N = 10_000
random.seed(0)
master = [random.randint(0, 100_000) for _ in range(N)]

for name, fn in [("bubble", bubble_sort),
                 ("insertion", insertion_sort)]:
    work = list(master)
    start = time.perf_counter()
    fn(work)
    elapsed = (time.perf_counter() - start) * 1000
    print(f"  {name:<10} {elapsed:>10.2f} ms")

# Built-in
work = list(master)
start = time.perf_counter()
work.sort()
elapsed = (time.perf_counter() - start) * 1000
print(f"  {'builtin':<10} {elapsed:>10.2f} ms")

Non-negotiables: 10k random items, three sorts on identical copies, time each. Expect: bubble ~6-10s, insertion ~3-5s, built-in 5-10 ms. The built-in is 500-1000× faster — that's tomorrow's lesson.

Start: [5, 3, 8, 1, 4] ↑ first card is "sorted" by itself Take 3: compare to 5; smaller; slot before → [3, 5, 8, 1, 4] Take 8: compare to 5; bigger; stays → [3, 5, 8, 1, 4] Take 1: compare to 8 (smaller), 5 (smaller), 3 (smaller), goes at the start → [1, 3, 5, 8, 4] Take 4: compare to 8 (smaller), 5 (smaller), 3 (bigger), slot after 3 → [1, 3, 4, 5, 8]

def insertion_sort(lst): for i in range(1, len(lst)): current = lst[i] # the card we're slotting j = i - 1 # Shift larger elements right while j >= 0 and lst[j] > current: lst[j + 1] = lst[j] j -= 1 lst[j + 1] = current # slot the card in data = [5, 3, 8, 1, 4] insertion_sort(data) print(data) # [1, 3, 4, 5, 8]

students = [("Aisyah", "A"), ("Wei Jie", "B"), ("Priya", "A"), ("Iman", "B")] # Sort by grade — stable sort keeps name order within each grade insertion_sort_key(students, key=lambda s: s[1]) # Stable result: [("Aisyah", "A"), ("Priya", "A"), ("Wei Jie", "B"), ("Iman", "B")]

✓ Small lists (≤ ~20 items) fastest in practice ✓ Almost-sorted lists near O(n) ✓ Streaming data — items arrive one insert as they come at a time ✓ As a sub-routine inside Timsort Python uses it for runs ≤ 64

# insertion_visual.py — print every step def insertion_sort_visual(lst): print(f"Start: {lst}") for i in range(1, len(lst)): current = lst[i] j = i - 1 while j >= 0 and lst[j] > current: lst[j + 1] = lst[j] j -= 1 lst[j + 1] = current sorted_part = lst[:i + 1] unsorted_part = lst[i + 1:] # Visual separator between sorted (left) and unsorted (right) joined = " ".join(str(x) for x in sorted_part) + " | " + \ " ".join(str(x) for x in unsorted_part) print(f"Step {i}: take {current}, place → [{joined}]") insertion_sort_visual([5, 1, 4, 2, 8, 3])

def insertion_sort_key(lst, key=lambda x: x): for i in range(1, len(lst)): current = lst[i] cur_key = key(current) j = i - 1 while j >= 0 and key(lst[j]) > cur_key: lst[j + 1] = lst[j] j -= 1 lst[j + 1] = current data = [("apple", 3), ("cherry", 1), ("banana", 2)] insertion_sort_key(data, key=lambda p: p[1]) print(data) # [('cherry', 1), ('banana', 2), ('apple', 3)]

def insertion_sort_rec(lst, n=None): if n is None: n = len(lst) if n <= 1: return insertion_sort_rec(lst, n - 1) # sort first n-1 items # Insert lst[n-1] into the sorted prefix last = lst[n - 1] j = n - 2 while j >= 0 and lst[j] > last: lst[j + 1] = lst[j] j -= 1 lst[j + 1] = last data = [5, 1, 4, 2, 8, 3] insertion_sort_rec(data) print(data) # [1, 2, 3, 4, 5, 8]

# insertion_cases.py — time the three cases import random, time def insertion_sort(lst): for i in range(1, len(lst)): cur = lst[i] j = i - 1 while j >= 0 and lst[j] > cur: lst[j + 1] = lst[j] j -= 1 lst[j + 1] = cur for label, data in [ ("Sorted", list(range(5000))), ("Random", random.sample(range(5000), 5000)), ("Reversed", list(range(5000, 0, -1))), ]: work = list(data) start = time.perf_counter() insertion_sort(work) elapsed = (time.perf_counter() - start) * 1000 print(f" {label:<10} {elapsed:>8.2f} ms")