Learning Goals
5 minBy the end of this lesson you can:
- Explain why AI needs lots of examples to learn well.
- Explain how more and more-varied data makes a better AI.
- Name where the data for an AI might come from.
Warm-Up · One Blurry Look
8 minLast lesson we turned pictures and words into numbers — that's data.
Imagine you had only ever seen one rambutan, once, in a blurry photo. Could you spot a rambutan at the market confidently?
Reveal the thinking
Probably not. You'd do far better after seeing many rambutans — big, small, red, yellow, in baskets and on trees. AI is exactly the same.
New Concept · Feed the Model Well
18 minData is the food
Think of an AI like a growing body. Data is its food.
- Only junk food → an unhealthy body.
- Only one food → a weak, fussy eater.
- Lots of varied, good food → a strong, healthy body.
An AI fed lots of good, varied examples learns to handle the real, messy world.
Two rules of good data
- Enough of it — many examples, not a handful.
- Variety and balance — all the kinds it will meet, in fair amounts.
Where data comes from
Photos people take, songs they play, words they type, sensors, cameras, clicks — all of these can become an AI's examples.
Data must be collected fairly and with permission. Never use photos of other people without their consent.
Worked Example · 5 Photos vs 500
18 minPriya wants an AI that tells cats from dogs in photos.
Try 1 — only 5 photos each
With so few examples, the AI barely learns the pattern. A fluffy dog easily fools it.
Try 2 — 500 photos each
Now it has seen cats and dogs of many colours, sizes and angles. It guesses far better.
The hidden trap — unbalanced data
Suppose all 500 cat photos are black cats. The AI may secretly learn "black = cat". Show it a white cat and it fails — and a black dog confuses it too.
More data helps — but it must be varied and balanced. The AI learns whatever pattern the data shows, even a wrong one.
Try It Yourself
20 minUse your worksheet.
You want to teach an AI "ripe banana" vs "unripe banana". List what photos you would collect for each, and how many.
Hint
Include different lighting, angles, and shades of green and yellow — many of each.
A "dog detector" was trained only on photos of brown dogs in parks. Name two kinds of photo it might get wrong, and why.
Hint
Think about colour and place — a white dog indoors, for example.
Mini-Challenge · Plan a Fair Food Dataset
12 minPlan a balanced dataset for an AI that tells apart nasi lemak, roti canai and satay.
Decide how many photos per dish, and list three kinds of variety you would include.
It works if each dish has roughly the same number of photos, with real-world variety (lighting, plate, angle).
Show one good plan
About 100 photos per dish (balanced). Variety: different stalls and plates, bright and dim lighting, close-up and far, full and half-eaten portions.
Recap
5 minData is the food of AI. It needs plenty of examples, with real variety and balance. Poor or one-sided data teaches the AI the wrong pattern — so we collect data carefully and fairly.
Vocabulary Card
- data
- The examples an AI learns from — photos, sounds, words, numbers.
- dataset
- A whole collection of examples gathered to train an AI.
- balanced data
- A dataset with fair, varied amounts of every kind it will meet.
Homework · Collect Ten Examples
≤ 20 minPick something you'd love an AI to recognise (a fruit, a coin, a hand sign). Describe ten varied examples you would collect to train it. You do not need to take real photos — just describe each one.
Use your own objects or drawings. Don't collect photos of other people without asking them first.
Sample · Teaching "ripe vs unripe mango"
Ten examples with variety: a fully green mango in sunlight; a green mango indoors; a half-yellow mango; a mostly-yellow mango; a ripe mango close-up; a ripe mango far away; two mangoes on a plate; one in a hand; one on a tree; one slightly bruised but ripe.
Yours will be different — ten varied, balanced examples is the goal.