PY-L6-01 · Why We Test — Real Bug Stories

Learning Goals

3 min

By the end of this lesson you can:

Explain why working code isn't the same as correct code.
Name the costs a missing test can cause: money, safety, trust, time.
Describe the "regression" problem — fixing one thing breaks another.
Adopt the engineer's mindset: a feature isn't done until it's tested.

Warm-Up · It Worked On My Machine

5 min

You've shipped dozens of programs across five levels. Each one "worked" — when you ran it, with the inputs you tried. But did you try:

An empty input? A negative number? A huge number?
A name with an emoji? A file that doesn't exist?
What happens after you change one line six months later?

Today's big idea

"It runs" means it didn't crash on the one path you tried. "It's tested" means you have proof it behaves correctly across many cases — proof a computer re-checks every time you change the code.

New Concept · What Tests Buy You

14 min

Famous bugs that tests would have caught

Mariner 1 (1962)   a single missing character in code → rocket destroyed
Ariane 5 (1996)    a number too big for its type → $370M rocket exploded
Knight Capital (2012) old code re-activated → lost $440M in 45 minutes
Therac-25 (1980s)  a race-condition bug → radiation overdoses, deaths

None were "hard" bugs. They were ordinary mistakes that a test would have caught for free — before they reached production.

The four things tests give you

Correctness — proof the code does what you intended, across many inputs.
Regression safety — change code fearlessly; tests scream if you broke something.
Documentation — a test shows exactly how code is meant to be used.
Design pressure — code that's hard to test is usually badly designed; testing pushes you toward cleaner code.

The regression problem

def discount(price, member):
    if member:
        return price * 0.9       # 10% off for members
    return price

# weeks later, you add a "sale" — and accidentally break members:
def discount(price, member):
    return price * 0.8           # oops: everyone gets 20%, members lose their deal

Without a test, this silent regression ships. With a test ("a member paying RM100 should pay RM90"), the change fails instantly and you fix it before anyone notices.

The cost curve

bug caught while coding      ~minutes to fix
bug caught in code review    ~hours
bug caught in production      ~days + angry users + lost money
bug that harms someone        priceless

Tests move bug-catching to the cheap end of that curve.

Worked Example · A Bug Hiding in Plain Sight

12 min

Here's a function that looks fine and even "works" on the obvious case:

def average(numbers):
    return sum(numbers) / len(numbers)

print(average([10, 20, 30]))     # → 20.0  ✅ looks great!

Now think like a tester. What inputs break it?

print(average([]))               # → ZeroDivisionError! 💥
print(average([5]))              # → 5.0   ✅
print(average([1, 2]))           # → 1.5   ✅

The empty-list case crashes. A quick, framework-free test makes the bug undeniable:

def average(numbers):
    if not numbers:
        return 0                 # the fix: decide what empty means
    return sum(numbers) / len(numbers)

# our first "test": a statement that must be true
assert average([10, 20, 30]) == 20.0
assert average([5]) == 5.0
assert average([]) == 0           # this used to crash — now it passes
print("all checks passed ✅")

all checks passed ✅

Read the diff

The test did three jobs at once: it found the empty-list bug, it documented the decision ("empty average is 0"), and it now guards against anyone re-introducing the crash. Three lines of assert — and you've started testing. (We'll meet assert properly in Lesson 5 and real frameworks from Lesson 7.)

Try It Yourself

13 min

01 🟢 Break your own code

Take any function you wrote in Levels 1-5. List 5 inputs you never tried (empty, zero, negative, huge, wrong type). Run them. Did any crash or misbehave?

Example

A "reverse a word" function: try "" (empty), "a" (single char), "racecar" (palindrome), an emoji, a number instead of a string. Edge cases reveal hidden assumptions.

02 🟡 Guard with asserts

Pick a function and write 4 assert checks for it — including at least one edge case. Make them all pass.

Hint

def is_even(n):
    return n % 2 == 0

assert is_even(4) is True
assert is_even(7) is False
assert is_even(0) is True
assert is_even(-2) is True
print("ok")

03 🔴 The regression trap

Write a function + asserts. Then deliberately "improve" the function in a way that breaks one case. Confirm an assert catches it — that's regression safety in action.

Mini-Challenge · The Bug Report

8 min

Research one famous software bug not listed above (e.g., the Y2K bug, the Pentium FDIV bug, a recent app outage). Write a short "bug report": what broke, the root cause, the cost, and the one-line test that would have caught it.

Show an example write-up

Bug:    Gangnam Style broke YouTube's view counter (2014)
Cause:  view count stored in a 32-bit int, max ~2.1 billion;
        the video passed it and the number wrapped/overflowed.
Cost:   counter displayed nonsense; emergency fix to 64-bit.
Test:   assert can_store_views(3_000_000_000) is True
        — a test asserting the counter handles >2.1B would have
          flagged the too-small type before launch.

Non-negotiables: name the bug, the root cause, the impact, and a concrete assertion that would have caught it.

Recap

3 min

"It runs" ≠ "it's correct". Tests give you correctness, regression safety, living documentation, and better design — and they move bug-catching to the cheap end of the cost curve. The most expensive software disasters were ordinary bugs a single test would have caught. From now on: a feature isn't done until it's tested.

Vocabulary Card

test: Code that automatically checks other code behaves as intended.
regression: A bug introduced by a change that broke something that used to work.
edge case: An unusual input (empty, zero, negative, huge) that often exposes bugs.
assert: A statement that must be true; raises an error if it isn't. Your simplest test.

Homework

4 min

Pick one program you built in an earlier level. Write a short document listing: 3 inputs you never tested, the result of trying each, any bug you found, and the assert you'd add to guard it. This is the start of a testing mindset.

Sample · auditing the L1 number guesser

Program: number guesser
Untested inputs:
  1. user types "abc" instead of a number  → crashed (ValueError)
  2. user guesses out of range (999)        → accepted silently (bug)
  3. empty input (just Enter)               → crashed
Guards to add:
  assert parse_guess("abc") is None         # reject non-numbers
  assert in_range(999, 1, 100) is False     # reject out-of-range

Non-negotiables: real untested inputs, the actual result, and a concrete assert per bug found.

Mariner 1 (1962) a single missing character in code → rocket destroyed Ariane 5 (1996) a number too big for its type → $370M rocket exploded Knight Capital (2012) old code re-activated → lost $440M in 45 minutes Therac-25 (1980s) a race-condition bug → radiation overdoses, deaths

def discount(price, member): if member: return price * 0.9 # 10% off for members return price # weeks later, you add a "sale" — and accidentally break members: def discount(price, member): return price * 0.8 # oops: everyone gets 20%, members lose their deal

def average(numbers): if not numbers: return 0 # the fix: decide what empty means return sum(numbers) / len(numbers) # our first "test": a statement that must be true assert average([10, 20, 30]) == 20.0 assert average([5]) == 5.0 assert average([]) == 0 # this used to crash — now it passes print("all checks passed ✅")

Bug: Gangnam Style broke YouTube's view counter (2014) Cause: view count stored in a 32-bit int, max ~2.1 billion; the video passed it and the number wrapped/overflowed. Cost: counter displayed nonsense; emergency fix to 64-bit. Test: assert can_store_views(3_000_000_000) is True — a test asserting the counter handles >2.1B would have flagged the too-small type before launch.