PY-L3-37 · Searching — Linear Search

Learning Goals

3 min

By the end of this lesson you can:

Implement linear search three ways: find-first, find-all, count.
Return both the index and the value when needed.
Reason about best, worst and average case — for a list of N items.
Identify when linear search is the right choice and when to reach for binary search instead.

Warm-Up · You've Done This Already

5 min

Every in operator and every .index() call you've written is a linear search:

names = ["Aisyah", "Wei Jie", "Priya", "Iman"]

"Priya" in names           # Python walks the list until it finds Priya
names.index("Priya")       # Same walk; returns the index (2)

Today we write the algorithm by hand. Same logic, more control — return early on hit, handle the not-found case explicitly, count matches.

Today's big idea

Linear search is unbeatable when the data isn't sorted, doesn't fit any index, or is searched only once. It's O(n) — slow on giant lists, but simple and always works.

New Concept · Three Flavours

14 min

1 · Find-first — return the index, or -1

def find_first(lst, target):
    for i, item in enumerate(lst):
        if item == target:
            return i
    return -1


print(find_first(["a", "b", "c", "d"], "c"))    # 2
print(find_first(["a", "b", "c", "d"], "z"))    # -1

The canonical shape. enumerate gives index + value. return i exits the moment we hit. return -1 after the loop signals "not found".

2 · Find-all — collect every match

def find_all(lst, target):
    return [i for i, item in enumerate(lst) if item == target]


print(find_all([10, 20, 30, 20, 40, 20], 20))    # [1, 3, 5]
print(find_all([1, 2, 3], 99))                    # []

List comprehension over enumerate. No early exit — must scan everything to find all matches.

3 · Count occurrences

def count_of(lst, target):
    n = 0
    for item in lst:
        if item == target:
            n += 1
    return n

# Or more Pythonically:
def count_of_v2(lst, target):
    return sum(1 for item in lst if item == target)

Same loop shape. We're still doing O(n) work — full scan unavoidable.

The cases — best, worst, average

Best case      Target is the first item                 → 1 comparison
Worst case     Target is the last (or not present)      → N comparisons
Average case   Random position in a random list         → N/2 comparisons

Conclusion: linear search is O(n).
For N = 1,000,000, that's up to 1 million comparisons.

What if the items are objects?

Same shape — just compare a specific attribute or use a predicate.

heroes = [Hero("Aisyah", 100), Hero("Wei Jie", 80), Hero("Priya", 95)]

# By attribute
def find_by_name(heroes, name):
    for h in heroes:
        if h.name == name:
            return h
    return None

# By any predicate
def find_first_where(seq, predicate):
    for item in seq:
        if predicate(item):
            return item
    return None

# Find the first hero with HP < 90
weak = find_first_where(heroes, lambda h: h.hp < 90)
print(weak.name if weak else "none found")

The predicate version is the most general — works for anything you can express as a boolean condition.

Early-exit short-circuit · the built-in way

Python's next() with a generator expression gives you find-first in one line:

weak = next((h for h in heroes if h.hp < 90), None)
# Returns first match, or None if no match

Same logic as the loop version, far more compact. Use this for one-off finds in real code.

Worked Example · Verifying the Bounds

12 min

Let's measure linear search empirically — confirm it's O(n). Save as linear_bench.py:

# linear_bench.py — count comparisons; demonstrate O(n) behaviour

import random
import time

class CountingList:
    """Wraps a list, counts comparisons against its items."""
    def __init__(self, items):
        self.items = items
        self.comparisons = 0
    def __iter__(self):
        return iter(self.items)
    def __getitem__(self, i):
        self.comparisons += 1
        return self.items[i]
    def __len__(self):
        return len(self.items)


def linear_search(seq, target):
    """Find target. Returns index or -1."""
    for i in range(len(seq)):
        if seq[i] == target:
            return i
    return -1


# Test for sizes from 10k to 1m
for size in [10_000, 100_000, 1_000_000]:
    data = CountingList(list(range(size)))
    target = -1                         # not in list — worst case

    data.comparisons = 0
    start = time.perf_counter()
    linear_search(data, target)
    elapsed = time.perf_counter() - start

    print(f"  N = {size:>8}  comparisons = {data.comparisons:>8}  "
          f"time = {elapsed * 1000:.2f} ms")

Sample output

  N =    10000  comparisons =    10000  time =   1.32 ms
  N =   100000  comparisons =   100000  time =  12.15 ms
  N =  1000000  comparisons =  1000000  time = 121.05 ms

Read the diff

Comparisons grow exactly linearly with N. Time grows roughly the same — 10× more items = 10× more time. That's O(n). Tomorrow's binary search will do the same job in ~20 comparisons for a million items — exponentially fewer.

Try It Yourself

13 min

01 🟢 Find by attribute

Write find_by_score(students, target_score) that returns the first student matching, or None. Each student is a dict with name and score.

Hint

def find_by_score(students, target_score):
    for s in students:
        if s["score"] == target_score:
            return s
    return None

students = [{"name": "A", "score": 50}, {"name": "B", "score": 95}]
print(find_by_score(students, 95))     # {'name': 'B', 'score': 95}
print(find_by_score(students, 33))     # None

02 🟡 Find every match

Write indices_of(lst, target) that returns a list of every index where target appears.

Hint

def indices_of(lst, target):
    return [i for i, x in enumerate(lst) if x == target]

print(indices_of([1, 2, 1, 3, 1, 4], 1))     # [0, 2, 4]
print(indices_of("banana", "a"))             # [1, 3, 5]

Works on strings too — strings are iterable, enumerate gives (index, char).

03 🔴 First match where predicate (stretch)

Write first_where(seq, predicate) that returns the first item for which predicate(item) is True. Implement both with a loop and with next() + generator expression.

Hint

# Loop version
def first_where(seq, predicate):
    for item in seq:
        if predicate(item):
            return item
    return None

# next() version
def first_where_v2(seq, predicate):
    return next((item for item in seq if predicate(item)), None)

# Both work the same way:
print(first_where([3, -1, 4, -2, 5], lambda x: x < 0))    # -1
print(first_where_v2([3, -1, 4, -2, 5], lambda x: x < 0)) # -1

The next(gen, default) trick handles the not-found case via the default value.

Mini-Challenge · The Multi-Field Search

8 min

Given a list of dicts with name, email, phone — build search(records, query) that returns every record matching the query as a substring of any field.

records = [
    {"name": "Aisyah", "email": "aisyah@example.com", "phone": "012-3456789"},
    {"name": "Wei Jie", "email": "wj@school.edu.my", "phone": "017-9921122"},
    {"name": "Priya",   "email": "priya@school.edu.my", "phone": "019-1112233"},
]

search(records, "school")            # 2 matches — Wei Jie and Priya
search(records, "017")               # 1 match — Wei Jie
search(records, "aisyah")            # 1 match — Aisyah (case-insensitive)

Show one possible solution

def search(records, query):
    q = query.lower()
    out = []
    for r in records:
        if any(q in str(v).lower() for v in r.values()):
            out.append(r)
    return out


for r in search(records, "school"):
    print(f"  {r['name']:<10} {r['email']}")
print()
for r in search(records, "017"):
    print(f"  {r['name']:<10} {r['phone']}")

Non-negotiables: case-insensitive (lower-case both sides), substring match (in), any across all field values. The shape is exactly what every "search box" in every app does — Pythonically.

Recap

3 min

Linear search walks a sequence until it finds the target or runs out. Three flavours: find-first (early exit, return index), find-all (full scan, return list), count (full scan, return number). O(n) in the worst case — N comparisons for N items. in, .index(), and next() + generator are Python's built-in versions. Linear search is the right tool when the data isn't sorted, when you'll only search once, or when the list is small.

Vocabulary Card

linear search: Walk the sequence; check each item; stop on hit or end. O(n).
find-first vs find-all: Find-first can exit early; find-all must scan the whole list.
predicate: A function returning True/False. Generalises search beyond equality.
next(gen, default): Pythonic find-first idiom — generator expression + default for not-found.

Homework

4 min

Build search_kit.py with four functions:

find_min(seq) — return the smallest item using a single pass.
find_min_index(seq) — return the index of the smallest.
any_negative(seq) — True if any item is negative (early exit when you see one).
all_positive(seq) — True only if every item is positive (early exit on the first non-positive).

All four are O(n) in worst case — the best case for #3 is O(1) (find a negative immediately) and same for #4 (find a non-positive immediately).

Sample · search_kit.py

def find_min(seq):
    if not seq:
        return None
    smallest = seq[0]
    for x in seq[1:]:
        if x < smallest:
            smallest = x
    return smallest

def find_min_index(seq):
    if not seq:
        return -1
    idx = 0
    for i in range(1, len(seq)):
        if seq[i] < seq[idx]:
            idx = i
    return idx

def any_negative(seq):
    for x in seq:
        if x < 0:
            return True            # early exit!
    return False

def all_positive(seq):
    for x in seq:
        if x <= 0:
            return False           # early exit on first non-positive
    return True


print(find_min([3, 1, 4, 1, 5, 9, 2, 6]))        # 1
print(find_min_index([3, 1, 4, 1, 5, 9, 2, 6]))  # 1 (first occurrence)
print(any_negative([1, 2, -3, 4]))                # True
print(all_positive([1, 2, 3]))                    # True
print(all_positive([1, 2, 0]))                    # False

Non-negotiables: any_negative and all_positive use early exit. The Pythonic versions would be any(x < 0 for x in seq) and all(x > 0 for x in seq) — Python's built-ins are already short-circuiting.

def find_first(lst, target): for i, item in enumerate(lst): if item == target: return i return -1 print(find_first(["a", "b", "c", "d"], "c")) # 2 print(find_first(["a", "b", "c", "d"], "z")) # -1

def find_all(lst, target): return [i for i, item in enumerate(lst) if item == target] print(find_all([10, 20, 30, 20, 40, 20], 20)) # [1, 3, 5] print(find_all([1, 2, 3], 99)) # []

def count_of(lst, target): n = 0 for item in lst: if item == target: n += 1 return n # Or more Pythonically: def count_of_v2(lst, target): return sum(1 for item in lst if item == target)

Best case Target is the first item → 1 comparison Worst case Target is the last (or not present) → N comparisons Average case Random position in a random list → N/2 comparisons Conclusion: linear search is O(n). For N = 1,000,000, that's up to 1 million comparisons.

heroes = [Hero("Aisyah", 100), Hero("Wei Jie", 80), Hero("Priya", 95)] # By attribute def find_by_name(heroes, name): for h in heroes: if h.name == name: return h return None # By any predicate def find_first_where(seq, predicate): for item in seq: if predicate(item): return item return None # Find the first hero with HP < 90 weak = find_first_where(heroes, lambda h: h.hp < 90) print(weak.name if weak else "none found")

# linear_bench.py — count comparisons; demonstrate O(n) behaviour import random import time class CountingList: """Wraps a list, counts comparisons against its items.""" def __init__(self, items): self.items = items self.comparisons = 0 def __iter__(self): return iter(self.items) def __getitem__(self, i): self.comparisons += 1 return self.items[i] def __len__(self): return len(self.items) def linear_search(seq, target): """Find target. Returns index or -1.""" for i in range(len(seq)): if seq[i] == target: return i return -1 # Test for sizes from 10k to 1m for size in [10_000, 100_000, 1_000_000]: data = CountingList(list(range(size))) target = -1 # not in list — worst case data.comparisons = 0 start = time.perf_counter() linear_search(data, target) elapsed = time.perf_counter() - start print(f" N = {size:>8} comparisons = {data.comparisons:>8} " f"time = {elapsed * 1000:.2f} ms")

def find_by_score(students, target_score): for s in students: if s["score"] == target_score: return s return None students = [{"name": "A", "score": 50}, {"name": "B", "score": 95}] print(find_by_score(students, 95)) # {'name': 'B', 'score': 95} print(find_by_score(students, 33)) # None

def indices_of(lst, target): return [i for i, x in enumerate(lst) if x == target] print(indices_of([1, 2, 1, 3, 1, 4], 1)) # [0, 2, 4] print(indices_of("banana", "a")) # [1, 3, 5]

# Loop version def first_where(seq, predicate): for item in seq: if predicate(item): return item return None # next() version def first_where_v2(seq, predicate): return next((item for item in seq if predicate(item)), None) # Both work the same way: print(first_where([3, -1, 4, -2, 5], lambda x: x < 0)) # -1 print(first_where_v2([3, -1, 4, -2, 5], lambda x: x < 0)) # -1

records = [ {"name": "Aisyah", "email": "aisyah@example.com", "phone": "012-3456789"}, {"name": "Wei Jie", "email": "wj@school.edu.my", "phone": "017-9921122"}, {"name": "Priya", "email": "priya@school.edu.my", "phone": "019-1112233"}, ] search(records, "school") # 2 matches — Wei Jie and Priya search(records, "017") # 1 match — Wei Jie search(records, "aisyah") # 1 match — Aisyah (case-insensitive)

def search(records, query): q = query.lower() out = [] for r in records: if any(q in str(v).lower() for v in r.values()): out.append(r) return out for r in search(records, "school"): print(f" {r['name']:<10} {r['email']}") print() for r in search(records, "017"): print(f" {r['name']:<10} {r['phone']}")