AILevel 1 · AI ExplorerLesson 6

L1 · 06

Data Is the Food of AI

An AI is only as good as the examples it learns from. Give it rich, varied data and it grows strong. Feed it junk and it grows weak.

⏱ 1.5 hours🤖 Concept lesson · no coding📚 After AI-L1-05💬 Discussion + worksheet
01

Learning Goals

5 min

By the end of this lesson you can:

  • Explain why AI needs lots of examples to learn well.
  • Explain how more and more-varied data makes a better AI.
  • Name where the data for an AI might come from.
02

Warm-Up · One Blurry Look

8 min

Last lesson we turned pictures and words into numbers — that's data.

Imagine you had only ever seen one rambutan, once, in a blurry photo. Could you spot a rambutan at the market confidently?

Reveal the thinking

Probably not. You'd do far better after seeing many rambutans — big, small, red, yellow, in baskets and on trees. AI is exactly the same.

03

New Concept · Feed the Model Well

18 min

Data is the food

Think of an AI like a growing body. Data is its food.

  • Only junk food → an unhealthy body.
  • Only one food → a weak, fussy eater.
  • Lots of varied, good food → a strong, healthy body.

An AI fed lots of good, varied examples learns to handle the real, messy world.

Two rules of good data

  • Enough of it — many examples, not a handful.
  • Variety and balance — all the kinds it will meet, in fair amounts.

Where data comes from

Photos people take, songs they play, words they type, sensors, cameras, clicks — all of these can become an AI's examples.

Watch out

Data must be collected fairly and with permission. Never use photos of other people without their consent.

04

Worked Example · 5 Photos vs 500

18 min

Priya wants an AI that tells cats from dogs in photos.

Try 1 — only 5 photos each

With so few examples, the AI barely learns the pattern. A fluffy dog easily fools it.

Try 2 — 500 photos each

Now it has seen cats and dogs of many colours, sizes and angles. It guesses far better.

The hidden trap — unbalanced data

Suppose all 500 cat photos are black cats. The AI may secretly learn "black = cat". Show it a white cat and it fails — and a black dog confuses it too.

The takeaway

More data helps — but it must be varied and balanced. The AI learns whatever pattern the data shows, even a wrong one.

05

Try It Yourself

20 min

Use your worksheet.

01 🟢 Plan the data

You want to teach an AI "ripe banana" vs "unripe banana". List what photos you would collect for each, and how many.

Hint

Include different lighting, angles, and shades of green and yellow — many of each.

02 🟡 Spot the gap

A "dog detector" was trained only on photos of brown dogs in parks. Name two kinds of photo it might get wrong, and why.

Hint

Think about colour and place — a white dog indoors, for example.

06

Mini-Challenge · Plan a Fair Food Dataset

12 min

Plan a balanced dataset for an AI that tells apart nasi lemak, roti canai and satay.

Decide how many photos per dish, and list three kinds of variety you would include.

It works if each dish has roughly the same number of photos, with real-world variety (lighting, plate, angle).

Show one good plan

About 100 photos per dish (balanced). Variety: different stalls and plates, bright and dim lighting, close-up and far, full and half-eaten portions.

07

Recap

5 min

Data is the food of AI. It needs plenty of examples, with real variety and balance. Poor or one-sided data teaches the AI the wrong pattern — so we collect data carefully and fairly.

Vocabulary Card

data
The examples an AI learns from — photos, sounds, words, numbers.
dataset
A whole collection of examples gathered to train an AI.
balanced data
A dataset with fair, varied amounts of every kind it will meet.
08

Homework · Collect Ten Examples

≤ 20 min

Pick something you'd love an AI to recognise (a fruit, a coin, a hand sign). Describe ten varied examples you would collect to train it. You do not need to take real photos — just describe each one.

Stay safe

Use your own objects or drawings. Don't collect photos of other people without asking them first.