PY-L5-01 · Welcome to AI — What Can Computers Learn?

Welcome to AI — What Can Computers Learn?

Welcome to Level 5. Everything you've built so far followed rules you wrote. Machine learning flips that: you show the computer examples, and it writes the rules itself. Today we unpack that one big idea — no maths, no jargon, just intuition.

⏱ 1 hour🧠 Concept lesson📚 After PY-L4-48💻 No install yet

Learning Goals

3 min

By the end of this lesson you can:

Explain the difference between rule-based programming and machine learning.
Name the three big families of ML: supervised, unsupervised, reinforcement.
Spot which everyday apps use ML and what they learned from.
Decide when a problem is — and isn't — a good fit for ML.

Warm-Up · Two Ways to Catch Spam

5 min

Imagine you must flag spam emails. The way you already know:

def is_spam(email):
    if "free money" in email.lower():
        return True
    if "click here to win" in email.lower():
        return True
    # ...100 more rules you have to write and maintain forever
    return False

The machine-learning way: collect 10,000 emails a human already labelled "spam" or "not spam", hand them to an algorithm, and it figures out the patterns itself — including ones you'd never have thought of.

Today's big idea

Traditional programming: rules + data → answers. Machine learning: data + answers → rules. You give it examples and the correct labels; it learns the rule that connects them.

New Concept · Three Kinds of Learning

14 min

1. Supervised learning — learning from labelled examples

You have inputs AND the correct answers. The model learns to map one to the other. This is 90% of what you'll do in Level 5.

emails  → spam / not spam        (classification)
photos  → cat / dog / bird       (classification)
house   → predicted price (RM)   (regression)

Two flavours: classification picks a category; regression predicts a number.

2. Unsupervised learning — finding structure without labels

You have inputs but NO answers. The model finds groups or patterns on its own.

1000 customers → 4 natural groups   (clustering)
shopping data  → "people who buy X also buy Y"

3. Reinforcement learning — learning by trial and reward

An agent takes actions, gets rewards or penalties, and learns a strategy. Think game-playing AIs and robots. We won't code this in L5, but you should know it exists.

game move → win/lose signal → better strategy next time

The honest truth about "learning"

A model doesn't understand anything. It finds statistical patterns: "emails with these word-frequencies tend to be spam". That's powerful and useful — and also why ML can be confidently wrong, biased, or fooled. We'll return to this in the ethics lesson.

When NOT to use ML

The rule is simple and known ("is this number even?") — just write the rule.
You have almost no data — ML needs examples.
A mistake is catastrophic and you can't explain the model's reasoning.

Worked Example · Classify the Problem

12 min

For each task, decide: rule-based or ML? If ML, which kind?

Task                                   Answer
─────────────────────────────────────  ─────────────────────────
Convert °C to °F                        rule-based (formula known)
Decide if a photo shows a cat           ML · supervised classification
Predict tomorrow's temperature          ML · supervised regression
Group 5000 songs into "moods"           ML · unsupervised clustering
Check if a password is ≥ 8 chars        rule-based
Recommend the next video to watch       ML (mix of supervised + RL)
Find the largest number in a list       rule-based

The test: can you write the rule yourself in a few lines? If yes, do that. If the rule is fuzzy, subjective, or hidden in lots of examples — that's ML territory.

A tiny "learned rule" by hand

Before any library, let's learn a rule manually. Given heights and a label (kid / adult), find the threshold that best separates them:

data = [
    (120, "kid"), (135, "kid"), (110, "kid"),
    (175, "adult"), (168, "adult"), (180, "adult"),
]

# Try every threshold, keep the one with fewest mistakes
best_t, best_errors = None, 999
for t in range(100, 200, 5):
    errors = 0
    for height, label in data:
        guess = "adult" if height >= t else "kid"
        if guess != label:
            errors += 1
    if errors < best_errors:
        best_t, best_errors = t, errors

print(f"learned threshold: {best_t} cm  ({best_errors} errors)")

learned threshold: 150 cm  (0 errors)

That loop just did machine learning: it searched for the rule (a threshold) that best fits the labelled data. Real ML does the same thing — just with cleverer search and millions of parameters.

Try It Yourself

13 min

01 🟢 Sort the tasks

Write down 5 apps you use. For each, guess: is there ML inside? What did it learn from?

Example answer

YouTube recommendations (learned from what billions of people watched next); phone face-unlock (learned from face photos); autocorrect (learned from typed text); Spotify Discover (learned from listening history); maps ETA (learned from historical traffic).

02 🟡 Learn a threshold

Adapt the worked example: given (temperature, "cold"/"hot") pairs, learn the threshold.

Hint

data = [(18,"cold"),(22,"cold"),(20,"cold"),(30,"hot"),(33,"hot"),(28,"hot")]
best_t, best_err = None, 999
for t in range(10, 40):
    err = sum(("hot" if temp >= t else "cold") != label for temp, label in data)
    if err < best_err:
        best_t, best_err = t, err
print(best_t, best_err)

03 🔴 Where rules beat ML

Name three problems where a hand-written rule is clearly better than ML, and explain why in one sentence each.

Example answer

Tax calculation (the law is an exact formula); checking a valid chess move (rules are fully known); validating an email format (a regex is exact and explainable). ML would add cost, unpredictability, and bugs for zero benefit.

Mini-Challenge · Two-Feature Classifier

8 min

Extend the height example to TWO features — height and weight — and a label of "cat" vs "dog". Find a simple rule combining both (e.g., a weighted sum threshold) that separates the data. No libraries; just loops.

Show one possible solution

# pets.py — hand-tuned two-feature rule
data = [
    # (length_cm, weight_kg, label)
    (45, 4, "cat"), (50, 5, "cat"), (40, 3.5, "cat"),
    (70, 25, "dog"), (60, 18, "dog"), (80, 30, "dog"),
]

# A rule: score = weight*2 + length*0.1 ; dog if score above a threshold
def score(length, weight):
    return weight * 2 + length * 0.1

# Find the best split point on that score
best_t, best_err = None, 999
for t in range(0, 80):
    err = sum(("dog" if score(l, w) >= t else "cat") != lbl
              for l, w, lbl in data)
    if err < best_err:
        best_t, best_err = t, err

print(f"threshold score = {best_t}, errors = {best_err}")
print("test (55cm, 6kg):",
      "dog" if score(55, 6) >= best_t else "cat")

Non-negotiables: combine two features into one score, search for the best threshold, test on a new unseen pet. This is exactly what a real classifier does — only the score function is learned, not hand-picked.

Recap

3 min

Traditional programming turns rules + data into answers. Machine learning turns data + answers into rules. Three families: supervised (labelled examples → classification or regression), unsupervised (find structure, no labels), reinforcement (learn from rewards). ML finds statistical patterns, not understanding — powerful, but fallible. Use ML only when the rule is too fuzzy to write by hand and you have enough data.

Vocabulary Card

machine learning: Programs that improve at a task by learning patterns from data, rather than being explicitly programmed with rules.
supervised learning: Learning from labelled examples (input + correct answer).
classification / regression: Supervised tasks predicting a category / a number.
unsupervised learning: Finding structure (groups, patterns) in data with no labels.

Homework

4 min

Write ml_or_not.py. Hard-code a list of 10 tasks (mix of rule-based and ML). For each, print your verdict: rule-based, ML-classification, ML-regression, or ML-clustering. Add a one-line reason for each.

Sample · ml_or_not.py

# ml_or_not.py — classify each task
tasks = [
    ("Convert km to miles",            "rule-based",       "exact formula"),
    ("Detect a face in a photo",       "ML-classification","fuzzy visual pattern"),
    ("Predict house price",            "ML-regression",    "continuous output"),
    ("Group news into topics",         "ML-clustering",    "no labels given"),
    ("Check if a year is a leap year", "rule-based",       "simple known rule"),
    ("Recognise spoken words",         "ML-classification","huge variety of audio"),
    ("Recommend a movie",              "ML-clustering",    "find similar users/items"),
    ("Estimate delivery time",         "ML-regression",    "number from many factors"),
    ("Validate a phone number",        "rule-based",       "regex / format check"),
    ("Sort emails into spam/ham",      "ML-classification","learned from labelled mail"),
]

for task, verdict, reason in tasks:
    print(f"  {task:<34} {verdict:<18} ({reason})")

Non-negotiables: at least 3 rule-based and a mix of all three ML types, one reason each.

Welcome to AI — What Can Computers Learn?

⏱ 1 hour🧠 Concept lesson📚 After PY-L4-48💻 No install yet

def is_spam(email): if "free money" in email.lower(): return True if "click here to win" in email.lower(): return True # ...100 more rules you have to write and maintain forever return False

Task Answer ───────────────────────────────────── ───────────────────────── Convert °C to °F rule-based (formula known) Decide if a photo shows a cat ML · supervised classification Predict tomorrow's temperature ML · supervised regression Group 5000 songs into "moods" ML · unsupervised clustering Check if a password is ≥ 8 chars rule-based Recommend the next video to watch ML (mix of supervised + RL) Find the largest number in a list rule-based

data = [ (120, "kid"), (135, "kid"), (110, "kid"), (175, "adult"), (168, "adult"), (180, "adult"), ] # Try every threshold, keep the one with fewest mistakes best_t, best_errors = None, 999 for t in range(100, 200, 5): errors = 0 for height, label in data: guess = "adult" if height >= t else "kid" if guess != label: errors += 1 if errors < best_errors: best_t, best_errors = t, errors print(f"learned threshold: {best_t} cm ({best_errors} errors)")

data = [(18,"cold"),(22,"cold"),(20,"cold"),(30,"hot"),(33,"hot"),(28,"hot")] best_t, best_err = None, 999 for t in range(10, 40): err = sum(("hot" if temp >= t else "cold") != label for temp, label in data) if err < best_err: best_t, best_err = t, err print(best_t, best_err)

# pets.py — hand-tuned two-feature rule data = [ # (length_cm, weight_kg, label) (45, 4, "cat"), (50, 5, "cat"), (40, 3.5, "cat"), (70, 25, "dog"), (60, 18, "dog"), (80, 30, "dog"), ] # A rule: score = weight*2 + length*0.1 ; dog if score above a threshold def score(length, weight): return weight * 2 + length * 0.1 # Find the best split point on that score best_t, best_err = None, 999 for t in range(0, 80): err = sum(("dog" if score(l, w) >= t else "cat") != lbl for l, w, lbl in data) if err < best_err: best_t, best_err = t, err print(f"threshold score = {best_t}, errors = {best_err}") print("test (55cm, 6kg):", "dog" if score(55, 6) >= best_t else "cat")