Learning Goals
3 minBy the end of this lesson you can:
- Name the common complexity classes: O(1), O(log n), O(n), O(n log n), O(n²), O(2ⁿ).
- Look at a function and guess its Big-O.
- Predict roughly how big an input each class can handle in one second.
- Pick a faster algorithm when the slow one would blow up.
Warm-Up · The Doubling Test
5 minThe fastest way to understand a function's Big-O: double the input. See what happens to the time.
When N doubles, the runtime… stays the same → O(1) constant goes up by 1 → O(log n) logarithmic doubles → O(n) linear slightly more than 2x → O(n log n) linearithmic quadruples → O(n²) quadratic goes up 8x → O(n³) cubic SQUARES → O(2ⁿ) exponential — bad
Big-O is "how the runtime grows". The constants don't matter; the shape of the curve does. An O(n) algorithm with terrible constants will still beat O(n²) once n is big enough.
New Concept · The Common Classes
14 minO(1) · Constant
Takes the same time regardless of input size. Read a known dict key, access a list by index, push onto a stack.
def first(lst): return lst[0] # O(1) — single index def get_age(person): return person["age"] # O(1) — dict lookup
O(log n) · Logarithmic
Time grows slowly. Doubling N adds just one comparison. Binary search is the classic.
def binary_search(lst, target): # ... halve the search space each step ... # O(log n) — 20 steps for 1 million items
O(n) · Linear
One pass over the data. Linear search, sum, max, the in operator.
def total(lst): s = 0 for x in lst: # one pass — O(n) s += x return s # Or: sum(lst), max(lst), len([x for x in lst if ...])
O(n log n) · Linearithmic
Slightly more than linear. Good sorts (Timsort, mergesort, heapsort) live here.
def average_then_sort(lst): avg = sum(lst) / len(lst) # O(n) return sorted(lst) # O(n log n) — dominates
O(n²) · Quadratic
Nested loop. Bubble sort, insertion sort, naïve pair-finding.
def has_duplicate_naive(lst): for i in range(len(lst)): for j in range(i + 1, len(lst)): # nested loop if lst[i] == lst[j]: return True return False # O(n²) — N=10000 takes seconds
Better:
def has_duplicate_fast(lst): return len(set(lst)) != len(lst) # O(n)
Same problem, two complexity classes. Algorithm choice matters more than micro-optimisation.
O(2ⁿ) · Exponential
Naïve recursive Fibonacci, brute-force subset enumeration. The danger zone.
def fib(n): if n < 2: return n return fib(n - 1) + fib(n - 2) # O(2ⁿ) — fib(40) is already slow.
How big a list can each handle in 1 second?
Class N you can handle in 1 second on a laptop
(very rough — depends on the constant)
O(1) unlimited
O(log n) ~1 quintillion (10¹⁸)
O(n) ~100 million
O(n log n) ~10 million
O(n²) ~5,000-10,000
O(n³) ~500
O(2ⁿ) ~30
O(n!) ~10The pattern: each step down the table caps the manageable N at a much smaller value. If your data could have a million items, you need at most O(n log n).
Worst case vs average case vs amortised
Big-O usually refers to worst case. Some operations have a worse worst case but better amortised cost — like appending to a Python list (occasionally has to resize, but on average O(1)). For now: think worst-case unless told otherwise.
Reading a function
Quick rules:
- One loop over the input → O(n)
- Two nested loops → O(n²)
- Halving the input each step → O(log n)
- Calling a sort inside → at least O(n log n)
- Recursive doubling branch (no memo) → O(2ⁿ)
- Indexing/dict/set membership → O(1)
Worked Example · Same Problem, Three Speeds
12 minThe "find a pair that sums to k" problem, solved three ways. Save as two_sum.py:
# two_sum.py — same answer, three complexities import random, time def two_sum_naive(lst, target): """O(n²) — check every pair.""" for i in range(len(lst)): for j in range(i + 1, len(lst)): if lst[i] + lst[j] == target: return (i, j) return None def two_sum_sorted(lst, target): """O(n log n) — sort then two-pointer.""" indexed = sorted(enumerate(lst), key=lambda p: p[1]) lo, hi = 0, len(indexed) - 1 while lo < hi: s = indexed[lo][1] + indexed[hi][1] if s == target: i, j = sorted([indexed[lo][0], indexed[hi][0]]) return (i, j) if s < target: lo += 1 else: hi -= 1 return None def two_sum_hash(lst, target): """O(n) — hash table single pass.""" seen = {} for i, x in enumerate(lst): if target - x in seen: return (seen[target - x], i) seen[x] = i return None # Race N = 5000 random.seed(0) data = random.sample(range(N * 10), N) target = data[10] + data[200] # guaranteed to exist for name, fn in [ ("naive O(n²)", two_sum_naive), ("sorted O(n log n)", two_sum_sorted), ("hash O(n)", two_sum_hash), ]: start = time.perf_counter() fn(data, target) elapsed = (time.perf_counter() - start) * 1000 print(f" {name:<20} {elapsed:>8.2f} ms")
Sample output
naive O(n²) 1450.00 ms sorted O(n log n) 2.50 ms hash O(n) 1.10 ms
Read the diff
Same answer. Three orders of magnitude difference. The naïve version is ~1300× slower than the hash version. At N=50,000 the naïve takes 2+ minutes; the hash version is still under 100ms. This is why Big-O matters.
Try It Yourself · Name That Big-O
13 minFor each function below, decide its Big-O. Cover the answers and try first.
def f1(lst): return lst[0] # ? def f2(lst): return sum(lst) # ? def f3(lst): return sorted(lst) # ? def f4(lst): for x in lst: for y in lst: print(x, y) # ?
Answers
f1: O(1). f2: O(n). f3: O(n log n). f4: O(n²).
def f5(lst): for x in lst: if x in lst: # ← in on a list is O(n) print(x) # Big-O? def f6(lst): s = set(lst) for x in lst: if x in s: # ← in on a set is O(1) print(x) # Big-O? def f7(n): while n > 0: n //= 2 # Big-O?
Answers
f5: O(n²) — for each item, scan the list. f6: O(n) — set lookup is O(1). f7: O(log n) — halving until 0.
def f8(n): # naive fib if n < 2: return n return f8(n - 1) + f8(n - 2) def f9(lst): while len(lst) > 1: lst = lst[::2] # take every 2nd element return lst[0] def f10(lst): while len(lst) > 1: mid = len(lst) // 2 f10(lst[:mid]) lst = lst[mid:]
Answers
f8: O(2ⁿ). f9: O(n log n) (each step is O(n) slicing, log n steps). f10: O(n²) average — recursion + slicing.
Mini-Challenge · Empirical Big-O
8 minFor each of these functions, time them on N = 100, 1000, 10000. Compute the ratio between consecutive timings. Identify the Big-O class from the ratio pattern.
def sum_loop(lst): s = 0 for x in lst: s += x return s def find_duplicates_naive(lst): found = [] for i in range(len(lst)): for j in range(i + 1, len(lst)): if lst[i] == lst[j]: found.append(lst[i]) return found def is_palindrome_arr(lst): return lst == lst[::-1]
Show one possible solution
# empirical_big_o.py — measure ratios, infer Big-O import random, time def sum_loop(lst): s = 0 for x in lst: s += x return s def find_duplicates_naive(lst): found = [] for i in range(len(lst)): for j in range(i + 1, len(lst)): if lst[i] == lst[j]: found.append(lst[i]) return found def is_palindrome_arr(lst): return lst == lst[::-1] def measure(fn, ns): prev = None print(f"\n{fn.__name__}:") print(f" {'N':<8}{'time ms':>10}{'ratio':>8}") for n in ns: data = random.sample(range(n * 10), n) start = time.perf_counter() fn(data) elapsed = (time.perf_counter() - start) * 1000 ratio = f"{elapsed / prev:.1f}x" if prev else "—" print(f" {n:<8}{elapsed:>10.2f}{ratio:>8}") prev = elapsed measure(sum_loop, [100, 1000, 10_000, 100_000]) measure(find_duplicates_naive, [100, 1000, 2000, 4000]) measure(is_palindrome_arr, [100, 1000, 10_000, 100_000])
Non-negotiables: time at multiple sizes, compute ratios. sum_loop and is_palindrome_arr should show 10× ratio when N is 10× — that's O(n). find_duplicates_naive should show ~4× when N doubles — that's O(n²).
Recap
3 minBig-O describes how runtime grows with input size. Common classes from fastest to slowest: O(1), O(log n), O(n), O(n log n), O(n²), O(n³), O(2ⁿ), O(n!). Constants don't matter; the curve shape does. Each step down dramatically caps the manageable input size — past O(n²), N=10,000 is already painful. The fastest way to identify an algorithm's class: double the input and see what happens to the time. Algorithm choice usually matters more than language-level optimisation.
Vocabulary Card
- Big-O
- An upper bound on how runtime grows with input size.
- O(1)
- Constant time. Index access, dict lookup, set membership.
- O(log n)
- Halve each step. Binary search.
- O(n)
- One pass. Sum, max, linear search.
- O(n log n)
- Good sorts. Timsort, mergesort, heapsort.
- O(n²)
- Nested loop. Bubble sort. Don't use past N ≈ 10,000.
- O(2ⁿ)
- Exponential. Naïve recursive fib. Don't use past N ≈ 30.
Homework
4 minFor each problem below, identify the Big-O of the obvious approach AND describe a faster algorithm.
- Given a list, find the most common item.
- Given two lists, find the items in both.
- Given a list of N people, find pairs that share a birthday.
- Given a sorted list of millions of numbers, count how many are between X and Y.
Sample · big-o analyses
1. Most common item Naïve: O(n²) — for each item, count its occurrences in the list. Better: O(n) — Counter(lst).most_common(1)[0] 2. Items in both lists Naïve: O(n × m) — for each item in A, scan B. Better: O(n + m) — set(A) & set(B). Set operations are O(n). 3. Pairs sharing a birthday Naïve: O(n²) — compare every pair. Better: O(n) — group by birthday in a dict, then check group sizes. 4. Sorted list, count between X and Y Naïve: O(n) — scan and count. Better: O(log n) — bisect_left(Y) - bisect_left(X).
Non-negotiables: identify obvious naïve, propose a Pythonic faster version. Most problems have an O(n) or O(log n) solution hiding behind the naïve O(n²).