PY-L3-42 · Algorithm Complexity — Big-O Intuition

Learning Goals

3 min

By the end of this lesson you can:

Name the common complexity classes: O(1), O(log n), O(n), O(n log n), O(n²), O(2ⁿ).
Look at a function and guess its Big-O.
Predict roughly how big an input each class can handle in one second.
Pick a faster algorithm when the slow one would blow up.

Warm-Up · The Doubling Test

5 min

The fastest way to understand a function's Big-O: double the input. See what happens to the time.

When N doubles, the runtime…

stays the same           →  O(1)         constant
goes up by 1             →  O(log n)     logarithmic
doubles                  →  O(n)         linear
slightly more than 2x    →  O(n log n)   linearithmic
quadruples               →  O(n²)        quadratic
goes up 8x               →  O(n³)        cubic
SQUARES                  →  O(2ⁿ)        exponential — bad

Today's big idea

Big-O is "how the runtime grows". The constants don't matter; the shape of the curve does. An O(n) algorithm with terrible constants will still beat O(n²) once n is big enough.

New Concept · The Common Classes

14 min

O(1) · Constant

Takes the same time regardless of input size. Read a known dict key, access a list by index, push onto a stack.

def first(lst):
    return lst[0]            # O(1) — single index

def get_age(person):
    return person["age"]     # O(1) — dict lookup

O(log n) · Logarithmic

Time grows slowly. Doubling N adds just one comparison. Binary search is the classic.

def binary_search(lst, target):
    # ... halve the search space each step ...
    # O(log n) — 20 steps for 1 million items

O(n) · Linear

One pass over the data. Linear search, sum, max, the in operator.

def total(lst):
    s = 0
    for x in lst:            # one pass — O(n)
        s += x
    return s

# Or: sum(lst), max(lst), len([x for x in lst if ...])

O(n log n) · Linearithmic

Slightly more than linear. Good sorts (Timsort, mergesort, heapsort) live here.

def average_then_sort(lst):
    avg = sum(lst) / len(lst)        # O(n)
    return sorted(lst)               # O(n log n) — dominates

O(n²) · Quadratic

Nested loop. Bubble sort, insertion sort, naïve pair-finding.

def has_duplicate_naive(lst):
    for i in range(len(lst)):
        for j in range(i + 1, len(lst)):    # nested loop
            if lst[i] == lst[j]:
                return True
    return False
# O(n²) — N=10000 takes seconds

Better:

def has_duplicate_fast(lst):
    return len(set(lst)) != len(lst)        # O(n)

Same problem, two complexity classes. Algorithm choice matters more than micro-optimisation.

O(2ⁿ) · Exponential

Naïve recursive Fibonacci, brute-force subset enumeration. The danger zone.

def fib(n):
    if n < 2: return n
    return fib(n - 1) + fib(n - 2)
# O(2ⁿ) — fib(40) is already slow.

How big a list can each handle in 1 second?

Class           N you can handle in 1 second on a laptop
                (very rough — depends on the constant)

O(1)            unlimited
O(log n)        ~1 quintillion (10¹⁸)
O(n)            ~100 million
O(n log n)      ~10 million
O(n²)           ~5,000-10,000
O(n³)           ~500
O(2ⁿ)           ~30
O(n!)           ~10

The pattern: each step down the table caps the manageable N at a much smaller value. If your data could have a million items, you need at most O(n log n).

Worst case vs average case vs amortised

Big-O usually refers to worst case. Some operations have a worse worst case but better amortised cost — like appending to a Python list (occasionally has to resize, but on average O(1)). For now: think worst-case unless told otherwise.

Reading a function

Quick rules:

One loop over the input → O(n)
Two nested loops → O(n²)
Halving the input each step → O(log n)
Calling a sort inside → at least O(n log n)
Recursive doubling branch (no memo) → O(2ⁿ)
Indexing/dict/set membership → O(1)

Worked Example · Same Problem, Three Speeds

12 min

The "find a pair that sums to k" problem, solved three ways. Save as two_sum.py:

# two_sum.py — same answer, three complexities

import random, time


def two_sum_naive(lst, target):
    """O(n²) — check every pair."""
    for i in range(len(lst)):
        for j in range(i + 1, len(lst)):
            if lst[i] + lst[j] == target:
                return (i, j)
    return None


def two_sum_sorted(lst, target):
    """O(n log n) — sort then two-pointer."""
    indexed = sorted(enumerate(lst), key=lambda p: p[1])
    lo, hi = 0, len(indexed) - 1
    while lo < hi:
        s = indexed[lo][1] + indexed[hi][1]
        if s == target:
            i, j = sorted([indexed[lo][0], indexed[hi][0]])
            return (i, j)
        if s < target:
            lo += 1
        else:
            hi -= 1
    return None


def two_sum_hash(lst, target):
    """O(n) — hash table single pass."""
    seen = {}
    for i, x in enumerate(lst):
        if target - x in seen:
            return (seen[target - x], i)
        seen[x] = i
    return None


# Race
N = 5000
random.seed(0)
data = random.sample(range(N * 10), N)
target = data[10] + data[200]                   # guaranteed to exist

for name, fn in [
    ("naive O(n²)",   two_sum_naive),
    ("sorted O(n log n)", two_sum_sorted),
    ("hash O(n)",     two_sum_hash),
]:
    start = time.perf_counter()
    fn(data, target)
    elapsed = (time.perf_counter() - start) * 1000
    print(f"  {name:<20} {elapsed:>8.2f} ms")

Sample output

  naive O(n²)         1450.00 ms
  sorted O(n log n)     2.50 ms
  hash O(n)             1.10 ms

Read the diff

Same answer. Three orders of magnitude difference. The naïve version is ~1300× slower than the hash version. At N=50,000 the naïve takes 2+ minutes; the hash version is still under 100ms. This is why Big-O matters.

Try It Yourself · Name That Big-O

13 min

For each function below, decide its Big-O. Cover the answers and try first.

01 🟢 Simple cases

def f1(lst):
    return lst[0]                            # ?

def f2(lst):
    return sum(lst)                          # ?

def f3(lst):
    return sorted(lst)                       # ?

def f4(lst):
    for x in lst:
        for y in lst:
            print(x, y)                       # ?

Answers

f1: O(1). f2: O(n). f3: O(n log n). f4: O(n²).

02 🟡 Trickier

def f5(lst):
    for x in lst:
        if x in lst:                          # ← in on a list is O(n)
            print(x)
# Big-O?

def f6(lst):
    s = set(lst)
    for x in lst:
        if x in s:                            # ← in on a set is O(1)
            print(x)
# Big-O?

def f7(n):
    while n > 0:
        n //= 2
# Big-O?

Answers

f5: O(n²) — for each item, scan the list. f6: O(n) — set lookup is O(1). f7: O(log n) — halving until 0.

03 🔴 The classics (stretch)

def f8(n):                    # naive fib
    if n < 2: return n
    return f8(n - 1) + f8(n - 2)

def f9(lst):
    while len(lst) > 1:
        lst = lst[::2]              # take every 2nd element
    return lst[0]

def f10(lst):
    while len(lst) > 1:
        mid = len(lst) // 2
        f10(lst[:mid])
        lst = lst[mid:]

Answers

f8: O(2ⁿ). f9: O(n log n) (each step is O(n) slicing, log n steps). f10: O(n²) average — recursion + slicing.

Mini-Challenge · Empirical Big-O

8 min

For each of these functions, time them on N = 100, 1000, 10000. Compute the ratio between consecutive timings. Identify the Big-O class from the ratio pattern.

def sum_loop(lst):
    s = 0
    for x in lst:
        s += x
    return s

def find_duplicates_naive(lst):
    found = []
    for i in range(len(lst)):
        for j in range(i + 1, len(lst)):
            if lst[i] == lst[j]:
                found.append(lst[i])
    return found

def is_palindrome_arr(lst):
    return lst == lst[::-1]

Show one possible solution

# empirical_big_o.py — measure ratios, infer Big-O

import random, time

def sum_loop(lst):
    s = 0
    for x in lst: s += x
    return s

def find_duplicates_naive(lst):
    found = []
    for i in range(len(lst)):
        for j in range(i + 1, len(lst)):
            if lst[i] == lst[j]:
                found.append(lst[i])
    return found

def is_palindrome_arr(lst):
    return lst == lst[::-1]


def measure(fn, ns):
    prev = None
    print(f"\n{fn.__name__}:")
    print(f"  {'N':<8}{'time ms':>10}{'ratio':>8}")
    for n in ns:
        data = random.sample(range(n * 10), n)
        start = time.perf_counter()
        fn(data)
        elapsed = (time.perf_counter() - start) * 1000
        ratio = f"{elapsed / prev:.1f}x" if prev else "—"
        print(f"  {n:<8}{elapsed:>10.2f}{ratio:>8}")
        prev = elapsed


measure(sum_loop, [100, 1000, 10_000, 100_000])
measure(find_duplicates_naive, [100, 1000, 2000, 4000])
measure(is_palindrome_arr, [100, 1000, 10_000, 100_000])

Non-negotiables: time at multiple sizes, compute ratios. sum_loop and is_palindrome_arr should show 10× ratio when N is 10× — that's O(n). find_duplicates_naive should show ~4× when N doubles — that's O(n²).

Recap

3 min

Big-O describes how runtime grows with input size. Common classes from fastest to slowest: O(1), O(log n), O(n), O(n log n), O(n²), O(n³), O(2ⁿ), O(n!). Constants don't matter; the curve shape does. Each step down dramatically caps the manageable input size — past O(n²), N=10,000 is already painful. The fastest way to identify an algorithm's class: double the input and see what happens to the time. Algorithm choice usually matters more than language-level optimisation.

Vocabulary Card

Big-O: An upper bound on how runtime grows with input size.
O(1): Constant time. Index access, dict lookup, set membership.
O(log n): Halve each step. Binary search.
O(n): One pass. Sum, max, linear search.
O(n log n): Good sorts. Timsort, mergesort, heapsort.
O(n²): Nested loop. Bubble sort. Don't use past N ≈ 10,000.
O(2ⁿ): Exponential. Naïve recursive fib. Don't use past N ≈ 30.

Homework

4 min

For each problem below, identify the Big-O of the obvious approach AND describe a faster algorithm.

Given a list, find the most common item.
Given two lists, find the items in both.
Given a list of N people, find pairs that share a birthday.
Given a sorted list of millions of numbers, count how many are between X and Y.

Sample · big-o analyses

1. Most common item
   Naïve: O(n²) — for each item, count its occurrences in the list.
   Better: O(n) — Counter(lst).most_common(1)[0]

2. Items in both lists
   Naïve: O(n × m) — for each item in A, scan B.
   Better: O(n + m) — set(A) & set(B). Set operations are O(n).

3. Pairs sharing a birthday
   Naïve: O(n²) — compare every pair.
   Better: O(n) — group by birthday in a dict, then check group sizes.

4. Sorted list, count between X and Y
   Naïve: O(n) — scan and count.
   Better: O(log n) — bisect_left(Y) - bisect_left(X).

Non-negotiables: identify obvious naïve, propose a Pythonic faster version. Most problems have an O(n) or O(log n) solution hiding behind the naïve O(n²).

When N doubles, the runtime… stays the same → O(1) constant goes up by 1 → O(log n) logarithmic doubles → O(n) linear slightly more than 2x → O(n log n) linearithmic quadruples → O(n²) quadratic goes up 8x → O(n³) cubic SQUARES → O(2ⁿ) exponential — bad

def has_duplicate_naive(lst): for i in range(len(lst)): for j in range(i + 1, len(lst)): # nested loop if lst[i] == lst[j]: return True return False # O(n²) — N=10000 takes seconds

Class N you can handle in 1 second on a laptop (very rough — depends on the constant) O(1) unlimited O(log n) ~1 quintillion (10¹⁸) O(n) ~100 million O(n log n) ~10 million O(n²) ~5,000-10,000 O(n³) ~500 O(2ⁿ) ~30 O(n!) ~10

# two_sum.py — same answer, three complexities import random, time def two_sum_naive(lst, target): """O(n²) — check every pair.""" for i in range(len(lst)): for j in range(i + 1, len(lst)): if lst[i] + lst[j] == target: return (i, j) return None def two_sum_sorted(lst, target): """O(n log n) — sort then two-pointer.""" indexed = sorted(enumerate(lst), key=lambda p: p[1]) lo, hi = 0, len(indexed) - 1 while lo < hi: s = indexed[lo][1] + indexed[hi][1] if s == target: i, j = sorted([indexed[lo][0], indexed[hi][0]]) return (i, j) if s < target: lo += 1 else: hi -= 1 return None def two_sum_hash(lst, target): """O(n) — hash table single pass.""" seen = {} for i, x in enumerate(lst): if target - x in seen: return (seen[target - x], i) seen[x] = i return None # Race N = 5000 random.seed(0) data = random.sample(range(N * 10), N) target = data[10] + data[200] # guaranteed to exist for name, fn in [ ("naive O(n²)", two_sum_naive), ("sorted O(n log n)", two_sum_sorted), ("hash O(n)", two_sum_hash), ]: start = time.perf_counter() fn(data, target) elapsed = (time.perf_counter() - start) * 1000 print(f" {name:<20} {elapsed:>8.2f} ms")

def f1(lst): return lst[0] # ? def f2(lst): return sum(lst) # ? def f3(lst): return sorted(lst) # ? def f4(lst): for x in lst: for y in lst: print(x, y) # ?

def f5(lst): for x in lst: if x in lst: # ← in on a list is O(n) print(x) # Big-O? def f6(lst): s = set(lst) for x in lst: if x in s: # ← in on a set is O(1) print(x) # Big-O? def f7(n): while n > 0: n //= 2 # Big-O?

def f8(n): # naive fib if n < 2: return n return f8(n - 1) + f8(n - 2) def f9(lst): while len(lst) > 1: lst = lst[::2] # take every 2nd element return lst[0] def f10(lst): while len(lst) > 1: mid = len(lst) // 2 f10(lst[:mid]) lst = lst[mid:]

def sum_loop(lst): s = 0 for x in lst: s += x return s def find_duplicates_naive(lst): found = [] for i in range(len(lst)): for j in range(i + 1, len(lst)): if lst[i] == lst[j]: found.append(lst[i]) return found def is_palindrome_arr(lst): return lst == lst[::-1]

# empirical_big_o.py — measure ratios, infer Big-O import random, time def sum_loop(lst): s = 0 for x in lst: s += x return s def find_duplicates_naive(lst): found = [] for i in range(len(lst)): for j in range(i + 1, len(lst)): if lst[i] == lst[j]: found.append(lst[i]) return found def is_palindrome_arr(lst): return lst == lst[::-1] def measure(fn, ns): prev = None print(f"\n{fn.__name__}:") print(f" {'N':<8}{'time ms':>10}{'ratio':>8}") for n in ns: data = random.sample(range(n * 10), n) start = time.perf_counter() fn(data) elapsed = (time.perf_counter() - start) * 1000 ratio = f"{elapsed / prev:.1f}x" if prev else "—" print(f" {n:<8}{elapsed:>10.2f}{ratio:>8}") prev = elapsed measure(sum_loop, [100, 1000, 10_000, 100_000]) measure(find_duplicates_naive, [100, 1000, 2000, 4000]) measure(is_palindrome_arr, [100, 1000, 10_000, 100_000])