PY-L3-38 · Searching — Binary Search

Learning Goals

3 min

By the end of this lesson you can:

Implement binary search iteratively with lo, hi, mid.
Implement the recursive version.
Recognise that the list must be sorted for binary search to work.
Use Python's built-in bisect module for production work.

Warm-Up · The Number-Guessing Game

5 min

Remember the higher-or-lower guessing game from PY-L1-45? When you guess a number 1-100 and Python says "higher" or "lower", the optimal strategy is to always guess the middle. That's binary search.

Range 1-100, target 73
guess 50:    "higher" → range 51-100
guess 75:    "lower"  → range 51-74
guess 62:    "higher" → range 63-74
guess 68:    "higher" → range 69-74
guess 71:    "higher" → range 72-74
guess 73:    found! 6 guesses for 100 numbers.

For 1,000,000 numbers: at most 20 guesses.

Each guess halves the remaining range. log₂(1,000,000) ≈ 20.

Today's big idea

Sorted data lets you exploit ordering. Linear search ignores it. Binary search uses it — and gets exponentially faster.

New Concept · The Halving Algorithm

14 min

The iterative version

def binary_search(sorted_list, target):
    lo = 0
    hi = len(sorted_list) - 1

    while lo <= hi:
        mid = (lo + hi) // 2
        if sorted_list[mid] == target:
            return mid                # found!
        elif sorted_list[mid] < target:
            lo = mid + 1              # target is in the upper half
        else:
            hi = mid - 1              # target is in the lower half

    return -1                          # not found


# Test
data = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(binary_search(data, 13))    # 6
print(binary_search(data, 4))     # -1

Three boundary variables, a comparison at the midpoint, three branches. Eleven lines of code that work in milliseconds on a sorted list of any reasonable size.

Why `(lo + hi) // 2`?

The midpoint of [lo, hi]. Integer division because list indices are ints. There's a famous overflow bug in this expression in C/Java — when lo + hi exceeds the max int. In Python ints are unbounded, so it's safe.

The recursive version

def binary_search_rec(sorted_list, target, lo=0, hi=None):
    if hi is None:
        hi = len(sorted_list) - 1

    if lo > hi:
        return -1                          # base case: empty range

    mid = (lo + hi) // 2
    if sorted_list[mid] == target:
        return mid
    elif sorted_list[mid] < target:
        return binary_search_rec(sorted_list, target, mid + 1, hi)
    else:
        return binary_search_rec(sorted_list, target, lo, mid - 1)

Same logic, recursive. The shape from PY-L3-36's homework. Both work fine; iterative wins on huge inputs because Python has no tail-call optimisation.

The sortedness is non-negotiable

unsorted = [11, 3, 17, 5, 9, 19, 7, 1, 15, 13]
binary_search(unsorted, 7)
# returns -1 even though 7 is in the list!

Binary search assumes sorted_list[mid] < target means "target is to the right". That's only true if the list is sorted. On unsorted data the answer is nonsense.

You can sort first — but sorting is O(n log n), often more expensive than a single linear search. Binary search wins only when you'll do many searches on the same sorted data.

Python's bisect module

For production code, use bisect:

import bisect

data = [1, 3, 5, 7, 9, 11, 13]

# Find the insertion point for a value
print(bisect.bisect_left(data, 7))     # 3 (index where 7 sits)
print(bisect.bisect_left(data, 6))     # 3 (would insert at index 3)

# Search — combine with comparison
def bsearch(lst, target):
    i = bisect.bisect_left(lst, target)
    if i < len(lst) and lst[i] == target:
        return i
    return -1

print(bsearch(data, 7))     # 3
print(bsearch(data, 6))     # -1

bisect is the C-implemented, batteries-included version. Use it when speed matters; understand the manual version so you know what it does.

Other bisect tricks

bisect.insort(lst, value) — keep a list sorted as you insert. Useful for online sorted-set scenarios.
bisect_left vs bisect_right — where to place duplicates. Read the docs if you need the distinction.

Worked Example · Race Linear vs Binary

12 min

Save as search_race.py:

# search_race.py — linear vs binary on the same data

import time
import bisect


def linear_search(lst, target):
    for i, x in enumerate(lst):
        if x == target:
            return i
    return -1


def binary_search(lst, target):
    lo, hi = 0, len(lst) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if lst[mid] == target:
            return mid
        if lst[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1


def race(size, n_searches=1000):
    """Run n_searches on a sorted list of given size."""
    data = list(range(size))
    targets = [size - 1] * n_searches      # worst case: last item

    # Linear
    start = time.perf_counter()
    for t in targets:
        linear_search(data, t)
    linear_time = time.perf_counter() - start

    # Binary
    start = time.perf_counter()
    for t in targets:
        binary_search(data, t)
    binary_time = time.perf_counter() - start

    # bisect (the C version)
    start = time.perf_counter()
    for t in targets:
        bisect.bisect_left(data, t)
    bisect_time = time.perf_counter() - start

    print(f"  N = {size:>8}  linear {linear_time:>7.3f}s   "
          f"binary {binary_time:>6.4f}s   "
          f"bisect {bisect_time:>6.4f}s")


print(f"Searching for the LAST item, 1000 times each:")
race(10_000)
race(100_000)
race(1_000_000)

Sample output (times will vary)

Searching for the LAST item, 1000 times each:
  N =    10000  linear   0.351s   binary 0.0008s   bisect 0.0002s
  N =   100000  linear   3.402s   binary 0.0010s   bisect 0.0002s
  N =  1000000  linear  34.110s   binary 0.0013s   bisect 0.0003s

Read the diff

Linear time grows linearly — 10× more items = 10× slower. Binary time barely changes — 10× more items = ~1 extra comparison. bisect (the C version) is ~3-5× faster than your Python binary search, for the same algorithm. Algorithm choice matters more than language tricks.

Try It Yourself

13 min

01 🟢 Verify the boundary

Test your binary_search on these edge cases: an empty list, a list of one item (target present), a list of one item (target absent), the first item of a list, the last item.

Hint

assert binary_search([], 5) == -1
assert binary_search([5], 5) == 0
assert binary_search([5], 99) == -1
assert binary_search([1, 2, 3, 4, 5], 1) == 0     # first
assert binary_search([1, 2, 3, 4, 5], 5) == 4     # last

Edge cases are where the off-by-one bugs hide. Run these every time you write a binary search.

02 🟡 Find insertion point

Without using bisect, write insertion_point(sorted_list, value) that returns the index where value should be inserted to keep the list sorted.

Hint

def insertion_point(lst, value):
    lo, hi = 0, len(lst)              # note: hi = len, not len - 1
    while lo < hi:
        mid = (lo + hi) // 2
        if lst[mid] < value:
            lo = mid + 1
        else:
            hi = mid
    return lo

print(insertion_point([1, 3, 5, 7, 9], 4))    # 2 — would go between 3 and 5
print(insertion_point([1, 3, 5, 7, 9], 9))    # 4 (or 5 — depends on left/right convention)
print(insertion_point([1, 3, 5, 7, 9], 0))    # 0
print(insertion_point([1, 3, 5, 7, 9], 10))   # 5

This is what bisect.bisect_left does. Useful for maintaining a sorted list as items arrive.

03 🔴 Square root via binary search (stretch)

Use binary search to find the integer square root of n — the largest integer whose square is ≤ n. Don't use math.sqrt or **.

Hint

def int_sqrt(n):
    if n < 0: return None
    lo, hi = 0, n
    while lo <= hi:
        mid = (lo + hi) // 2
        sq = mid * mid
        if sq == n: return mid
        if sq < n:
            lo = mid + 1
        else:
            hi = mid - 1
    return hi    # the largest mid whose square was ≤ n

print(int_sqrt(0))       # 0
print(int_sqrt(15))      # 3 (since 3*3=9 ≤ 15 < 16=4*4)
print(int_sqrt(16))      # 4
print(int_sqrt(10**18))  # ~1_000_000_000

Binary search isn't just for lists — it works for any "guess and check" where you can narrow the range each step. Square-root, optimisation, finding a threshold. The pattern generalises.

Mini-Challenge · Reverse-Engineer Binary Search

8 min

Build a function guess_my_number(low, high, target) that prints every guess your binary search makes. Show how many guesses it takes to find numbers at different positions.

guess_my_number(1, 100, 73)
# guess 50  -> too low
# guess 75  -> too high
# guess 62  -> too low
# guess 68  -> too low
# guess 71  -> too low
# guess 73  -> FOUND in 6 guesses!

Show one possible solution

# guess_my_number.py — show binary search in action

def guess_my_number(low, high, target):
    guesses = 0
    while low <= high:
        guesses += 1
        guess = (low + high) // 2
        print(f"  guess {guess}", end=" ")
        if guess == target:
            print(f"-> FOUND in {guesses} guesses!")
            return
        if guess < target:
            print("-> too low")
            low = guess + 1
        else:
            print("-> too high")
            high = guess - 1
    print(f"  not found after {guesses} guesses")


print("Target 73, range 1-100:")
guess_my_number(1, 100, 73)

print("\nTarget 1, range 1-100:")
guess_my_number(1, 100, 1)

print("\nTarget 999, range 1-1000:")
guess_my_number(1, 1000, 999)

Non-negotiables: print every guess + direction; report total. Confirms what theory says — never more than log₂(N) + 1 guesses.

Recap

3 min

Binary search halves the search space at each step. Three pointers (lo, hi, mid); compare; move one pointer. O(log n) — twenty steps to find one in a million. The list must be sorted. Python's bisect module gives you a C-fast version for production use. The technique generalises beyond lists — anywhere you can guess-and-narrow, binary search applies.

Vocabulary Card

binary search: Halve-the-range search on sorted data. O(log n).
lo / hi / mid: The three pointers that bound and bisect the search range.
O(log n): The complexity class. Doubling N adds just one comparison.
bisect module: Python's built-in binary-search module. bisect_left, insort.
insertion point: Where a new value should go to keep the list sorted.

Homework

4 min

Build sorted_set.py. A class that wraps a sorted list and offers:

add(value) — insert in the right place using bisect (or your own binary search) — O(log n) to find, O(n) to actually insert.
contains(value) — binary search, returns True/False.
__len__.

Test: add 10000 random ints. Time 1000 contains() calls. Compare to a plain list's in.

Sample · sorted_set.py

import bisect, random, time

class SortedSet:
    def __init__(self):
        self._items = []

    def add(self, value):
        i = bisect.bisect_left(self._items, value)
        # Avoid duplicates (true set behaviour)
        if i < len(self._items) and self._items[i] == value:
            return
        self._items.insert(i, value)

    def contains(self, value):
        i = bisect.bisect_left(self._items, value)
        return i < len(self._items) and self._items[i] == value

    def __len__(self):
        return len(self._items)


# Build
s = SortedSet()
plain = []
random.seed(0)
for _ in range(10_000):
    n = random.randint(0, 999_999)
    s.add(n)
    plain.append(n)

# Time 1000 lookups
targets = [random.randint(0, 999_999) for _ in range(1000)]

start = time.perf_counter()
for t in targets:
    s.contains(t)
t_sorted = time.perf_counter() - start

start = time.perf_counter()
for t in targets:
    _ = t in plain
t_plain = time.perf_counter() - start

print(f"Sorted set lookups: {t_sorted * 1000:.2f} ms")
print(f"Plain list lookups: {t_plain * 1000:.2f} ms")

Non-negotiables: bisect for add and contains. The plain-list in is O(n) per call; the sorted-set is O(log n). On a 10k list with 1000 lookups, expect a 20-50x speedup.

Range 1-100, target 73 guess 50: "higher" → range 51-100 guess 75: "lower" → range 51-74 guess 62: "higher" → range 63-74 guess 68: "higher" → range 69-74 guess 71: "higher" → range 72-74 guess 73: found! 6 guesses for 100 numbers. For 1,000,000 numbers: at most 20 guesses.

def binary_search(sorted_list, target): lo = 0 hi = len(sorted_list) - 1 while lo <= hi: mid = (lo + hi) // 2 if sorted_list[mid] == target: return mid # found! elif sorted_list[mid] < target: lo = mid + 1 # target is in the upper half else: hi = mid - 1 # target is in the lower half return -1 # not found # Test data = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19] print(binary_search(data, 13)) # 6 print(binary_search(data, 4)) # -1

def binary_search_rec(sorted_list, target, lo=0, hi=None): if hi is None: hi = len(sorted_list) - 1 if lo > hi: return -1 # base case: empty range mid = (lo + hi) // 2 if sorted_list[mid] == target: return mid elif sorted_list[mid] < target: return binary_search_rec(sorted_list, target, mid + 1, hi) else: return binary_search_rec(sorted_list, target, lo, mid - 1)

import bisect data = [1, 3, 5, 7, 9, 11, 13] # Find the insertion point for a value print(bisect.bisect_left(data, 7)) # 3 (index where 7 sits) print(bisect.bisect_left(data, 6)) # 3 (would insert at index 3) # Search — combine with comparison def bsearch(lst, target): i = bisect.bisect_left(lst, target) if i < len(lst) and lst[i] == target: return i return -1 print(bsearch(data, 7)) # 3 print(bsearch(data, 6)) # -1

# search_race.py — linear vs binary on the same data import time import bisect def linear_search(lst, target): for i, x in enumerate(lst): if x == target: return i return -1 def binary_search(lst, target): lo, hi = 0, len(lst) - 1 while lo <= hi: mid = (lo + hi) // 2 if lst[mid] == target: return mid if lst[mid] < target: lo = mid + 1 else: hi = mid - 1 return -1 def race(size, n_searches=1000): """Run n_searches on a sorted list of given size.""" data = list(range(size)) targets = [size - 1] * n_searches # worst case: last item # Linear start = time.perf_counter() for t in targets: linear_search(data, t) linear_time = time.perf_counter() - start # Binary start = time.perf_counter() for t in targets: binary_search(data, t) binary_time = time.perf_counter() - start # bisect (the C version) start = time.perf_counter() for t in targets: bisect.bisect_left(data, t) bisect_time = time.perf_counter() - start print(f" N = {size:>8} linear {linear_time:>7.3f}s " f"binary {binary_time:>6.4f}s " f"bisect {bisect_time:>6.4f}s") print(f"Searching for the LAST item, 1000 times each:") race(10_000) race(100_000) race(1_000_000)

Searching for the LAST item, 1000 times each: N = 10000 linear 0.351s binary 0.0008s bisect 0.0002s N = 100000 linear 3.402s binary 0.0010s bisect 0.0002s N = 1000000 linear 34.110s binary 0.0013s bisect 0.0003s

assert binary_search([], 5) == -1 assert binary_search([5], 5) == 0 assert binary_search([5], 99) == -1 assert binary_search([1, 2, 3, 4, 5], 1) == 0 # first assert binary_search([1, 2, 3, 4, 5], 5) == 4 # last

def insertion_point(lst, value): lo, hi = 0, len(lst) # note: hi = len, not len - 1 while lo < hi: mid = (lo + hi) // 2 if lst[mid] < value: lo = mid + 1 else: hi = mid return lo print(insertion_point([1, 3, 5, 7, 9], 4)) # 2 — would go between 3 and 5 print(insertion_point([1, 3, 5, 7, 9], 9)) # 4 (or 5 — depends on left/right convention) print(insertion_point([1, 3, 5, 7, 9], 0)) # 0 print(insertion_point([1, 3, 5, 7, 9], 10)) # 5

def int_sqrt(n): if n < 0: return None lo, hi = 0, n while lo <= hi: mid = (lo + hi) // 2 sq = mid * mid if sq == n: return mid if sq < n: lo = mid + 1 else: hi = mid - 1 return hi # the largest mid whose square was ≤ n print(int_sqrt(0)) # 0 print(int_sqrt(15)) # 3 (since 3*3=9 ≤ 15 < 16=4*4) print(int_sqrt(16)) # 4 print(int_sqrt(10**18)) # ~1_000_000_000

# guess_my_number.py — show binary search in action def guess_my_number(low, high, target): guesses = 0 while low <= high: guesses += 1 guess = (low + high) // 2 print(f" guess {guess}", end=" ") if guess == target: print(f"-> FOUND in {guesses} guesses!") return if guess < target: print("-> too low") low = guess + 1 else: print("-> too high") high = guess - 1 print(f" not found after {guesses} guesses") print("Target 73, range 1-100:") guess_my_number(1, 100, 73) print("\nTarget 1, range 1-100:") guess_my_number(1, 100, 1) print("\nTarget 999, range 1-1000:") guess_my_number(1, 1000, 999)

Learning Goals

Warm-Up · The Number-Guessing Game

New Concept · The Halving Algorithm

The iterative version

Why (lo + hi) // 2?

The recursive version

The sortedness is non-negotiable

Python's bisect module

Other bisect tricks

Worked Example · Race Linear vs Binary

Read the diff

Try It Yourself

Mini-Challenge · Reverse-Engineer Binary Search

Recap

Vocabulary Card

Homework

Sample · sorted_set.py

Learning Goals

Warm-Up · The Number-Guessing Game

New Concept · The Halving Algorithm

The iterative version

Why (lo + hi) // 2?

The recursive version

The sortedness is non-negotiable

Python's bisect module

Other bisect tricks

Worked Example · Race Linear vs Binary

Read the diff

Try It Yourself

Mini-Challenge · Reverse-Engineer Binary Search

Recap

Vocabulary Card

Homework

Sample · sorted_set.py

Why `(lo + hi) // 2`?

Why `(lo + hi) // 2`?