Learning Goals
3 minBy the end of this lesson you can:
- Implement binary search iteratively with
lo,hi,mid. - Implement the recursive version.
- Recognise that the list must be sorted for binary search to work.
- Use Python's built-in
bisectmodule for production work.
Warm-Up · The Number-Guessing Game
5 minRemember the higher-or-lower guessing game from PY-L1-45? When you guess a number 1-100 and Python says "higher" or "lower", the optimal strategy is to always guess the middle. That's binary search.
Range 1-100, target 73 guess 50: "higher" → range 51-100 guess 75: "lower" → range 51-74 guess 62: "higher" → range 63-74 guess 68: "higher" → range 69-74 guess 71: "higher" → range 72-74 guess 73: found! 6 guesses for 100 numbers. For 1,000,000 numbers: at most 20 guesses.
Each guess halves the remaining range. log₂(1,000,000) ≈ 20.
Sorted data lets you exploit ordering. Linear search ignores it. Binary search uses it — and gets exponentially faster.
New Concept · The Halving Algorithm
14 minThe iterative version
def binary_search(sorted_list, target): lo = 0 hi = len(sorted_list) - 1 while lo <= hi: mid = (lo + hi) // 2 if sorted_list[mid] == target: return mid # found! elif sorted_list[mid] < target: lo = mid + 1 # target is in the upper half else: hi = mid - 1 # target is in the lower half return -1 # not found # Test data = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19] print(binary_search(data, 13)) # 6 print(binary_search(data, 4)) # -1
Three boundary variables, a comparison at the midpoint, three branches. Eleven lines of code that work in milliseconds on a sorted list of any reasonable size.
Why (lo + hi) // 2?
The midpoint of [lo, hi]. Integer division because list indices are ints. There's a famous overflow bug in this expression in C/Java — when lo + hi exceeds the max int. In Python ints are unbounded, so it's safe.
The recursive version
def binary_search_rec(sorted_list, target, lo=0, hi=None): if hi is None: hi = len(sorted_list) - 1 if lo > hi: return -1 # base case: empty range mid = (lo + hi) // 2 if sorted_list[mid] == target: return mid elif sorted_list[mid] < target: return binary_search_rec(sorted_list, target, mid + 1, hi) else: return binary_search_rec(sorted_list, target, lo, mid - 1)
Same logic, recursive. The shape from PY-L3-36's homework. Both work fine; iterative wins on huge inputs because Python has no tail-call optimisation.
The sortedness is non-negotiable
unsorted = [11, 3, 17, 5, 9, 19, 7, 1, 15, 13] binary_search(unsorted, 7) # returns -1 even though 7 is in the list!
Binary search assumes sorted_list[mid] < target means "target is to the right". That's only true if the list is sorted. On unsorted data the answer is nonsense.
You can sort first — but sorting is O(n log n), often more expensive than a single linear search. Binary search wins only when you'll do many searches on the same sorted data.
Python's bisect module
For production code, use bisect:
import bisect data = [1, 3, 5, 7, 9, 11, 13] # Find the insertion point for a value print(bisect.bisect_left(data, 7)) # 3 (index where 7 sits) print(bisect.bisect_left(data, 6)) # 3 (would insert at index 3) # Search — combine with comparison def bsearch(lst, target): i = bisect.bisect_left(lst, target) if i < len(lst) and lst[i] == target: return i return -1 print(bsearch(data, 7)) # 3 print(bsearch(data, 6)) # -1
bisect is the C-implemented, batteries-included version. Use it when speed matters; understand the manual version so you know what it does.
Other bisect tricks
bisect.insort(lst, value)— keep a list sorted as you insert. Useful for online sorted-set scenarios.bisect_leftvsbisect_right— where to place duplicates. Read the docs if you need the distinction.
Worked Example · Race Linear vs Binary
12 minSave as search_race.py:
# search_race.py — linear vs binary on the same data import time import bisect def linear_search(lst, target): for i, x in enumerate(lst): if x == target: return i return -1 def binary_search(lst, target): lo, hi = 0, len(lst) - 1 while lo <= hi: mid = (lo + hi) // 2 if lst[mid] == target: return mid if lst[mid] < target: lo = mid + 1 else: hi = mid - 1 return -1 def race(size, n_searches=1000): """Run n_searches on a sorted list of given size.""" data = list(range(size)) targets = [size - 1] * n_searches # worst case: last item # Linear start = time.perf_counter() for t in targets: linear_search(data, t) linear_time = time.perf_counter() - start # Binary start = time.perf_counter() for t in targets: binary_search(data, t) binary_time = time.perf_counter() - start # bisect (the C version) start = time.perf_counter() for t in targets: bisect.bisect_left(data, t) bisect_time = time.perf_counter() - start print(f" N = {size:>8} linear {linear_time:>7.3f}s " f"binary {binary_time:>6.4f}s " f"bisect {bisect_time:>6.4f}s") print(f"Searching for the LAST item, 1000 times each:") race(10_000) race(100_000) race(1_000_000)
Sample output (times will vary)
Searching for the LAST item, 1000 times each: N = 10000 linear 0.351s binary 0.0008s bisect 0.0002s N = 100000 linear 3.402s binary 0.0010s bisect 0.0002s N = 1000000 linear 34.110s binary 0.0013s bisect 0.0003s
Read the diff
Linear time grows linearly — 10× more items = 10× slower. Binary time barely changes — 10× more items = ~1 extra comparison. bisect (the C version) is ~3-5× faster than your Python binary search, for the same algorithm. Algorithm choice matters more than language tricks.
Try It Yourself
13 minTest your binary_search on these edge cases: an empty list, a list of one item (target present), a list of one item (target absent), the first item of a list, the last item.
Hint
assert binary_search([], 5) == -1 assert binary_search([5], 5) == 0 assert binary_search([5], 99) == -1 assert binary_search([1, 2, 3, 4, 5], 1) == 0 # first assert binary_search([1, 2, 3, 4, 5], 5) == 4 # last
Edge cases are where the off-by-one bugs hide. Run these every time you write a binary search.
Without using bisect, write insertion_point(sorted_list, value) that returns the index where value should be inserted to keep the list sorted.
Hint
def insertion_point(lst, value): lo, hi = 0, len(lst) # note: hi = len, not len - 1 while lo < hi: mid = (lo + hi) // 2 if lst[mid] < value: lo = mid + 1 else: hi = mid return lo print(insertion_point([1, 3, 5, 7, 9], 4)) # 2 — would go between 3 and 5 print(insertion_point([1, 3, 5, 7, 9], 9)) # 4 (or 5 — depends on left/right convention) print(insertion_point([1, 3, 5, 7, 9], 0)) # 0 print(insertion_point([1, 3, 5, 7, 9], 10)) # 5
This is what bisect.bisect_left does. Useful for maintaining a sorted list as items arrive.
Use binary search to find the integer square root of n — the largest integer whose square is ≤ n. Don't use math.sqrt or **.
Hint
def int_sqrt(n): if n < 0: return None lo, hi = 0, n while lo <= hi: mid = (lo + hi) // 2 sq = mid * mid if sq == n: return mid if sq < n: lo = mid + 1 else: hi = mid - 1 return hi # the largest mid whose square was ≤ n print(int_sqrt(0)) # 0 print(int_sqrt(15)) # 3 (since 3*3=9 ≤ 15 < 16=4*4) print(int_sqrt(16)) # 4 print(int_sqrt(10**18)) # ~1_000_000_000
Binary search isn't just for lists — it works for any "guess and check" where you can narrow the range each step. Square-root, optimisation, finding a threshold. The pattern generalises.
Mini-Challenge · Reverse-Engineer Binary Search
8 minBuild a function guess_my_number(low, high, target) that prints every guess your binary search makes. Show how many guesses it takes to find numbers at different positions.
guess_my_number(1, 100, 73) # guess 50 -> too low # guess 75 -> too high # guess 62 -> too low # guess 68 -> too low # guess 71 -> too low # guess 73 -> FOUND in 6 guesses!
Show one possible solution
# guess_my_number.py — show binary search in action def guess_my_number(low, high, target): guesses = 0 while low <= high: guesses += 1 guess = (low + high) // 2 print(f" guess {guess}", end=" ") if guess == target: print(f"-> FOUND in {guesses} guesses!") return if guess < target: print("-> too low") low = guess + 1 else: print("-> too high") high = guess - 1 print(f" not found after {guesses} guesses") print("Target 73, range 1-100:") guess_my_number(1, 100, 73) print("\nTarget 1, range 1-100:") guess_my_number(1, 100, 1) print("\nTarget 999, range 1-1000:") guess_my_number(1, 1000, 999)
Non-negotiables: print every guess + direction; report total. Confirms what theory says — never more than log₂(N) + 1 guesses.
Recap
3 minBinary search halves the search space at each step. Three pointers (lo, hi, mid); compare; move one pointer. O(log n) — twenty steps to find one in a million. The list must be sorted. Python's bisect module gives you a C-fast version for production use. The technique generalises beyond lists — anywhere you can guess-and-narrow, binary search applies.
Vocabulary Card
- binary search
- Halve-the-range search on sorted data. O(log n).
- lo / hi / mid
- The three pointers that bound and bisect the search range.
- O(log n)
- The complexity class. Doubling N adds just one comparison.
- bisect module
- Python's built-in binary-search module.
bisect_left,insort. - insertion point
- Where a new value should go to keep the list sorted.
Homework
4 minBuild sorted_set.py. A class that wraps a sorted list and offers:
add(value)— insert in the right place using bisect (or your own binary search) — O(log n) to find, O(n) to actually insert.contains(value)— binary search, returns True/False.__len__.
Test: add 10000 random ints. Time 1000 contains() calls. Compare to a plain list's in.
Sample · sorted_set.py
import bisect, random, time class SortedSet: def __init__(self): self._items = [] def add(self, value): i = bisect.bisect_left(self._items, value) # Avoid duplicates (true set behaviour) if i < len(self._items) and self._items[i] == value: return self._items.insert(i, value) def contains(self, value): i = bisect.bisect_left(self._items, value) return i < len(self._items) and self._items[i] == value def __len__(self): return len(self._items) # Build s = SortedSet() plain = [] random.seed(0) for _ in range(10_000): n = random.randint(0, 999_999) s.add(n) plain.append(n) # Time 1000 lookups targets = [random.randint(0, 999_999) for _ in range(1000)] start = time.perf_counter() for t in targets: s.contains(t) t_sorted = time.perf_counter() - start start = time.perf_counter() for t in targets: _ = t in plain t_plain = time.perf_counter() - start print(f"Sorted set lookups: {t_sorted * 1000:.2f} ms") print(f"Plain list lookups: {t_plain * 1000:.2f} ms")
Non-negotiables: bisect for add and contains. The plain-list in is O(n) per call; the sorted-set is O(log n). On a 10k list with 1000 lookups, expect a 20-50x speedup.