AILevel 1 · AI ExplorerLesson 14

L1 · 14

Rewards & Practice — How Game AIs Get Better

Some AIs aren't taught with answers at all. They learn like you learn a game — by trying, scoring points, and practising thousands of times.

⏱ 1.5 hours🤖 Concept lesson · no coding📚 After AI-L1-13💬 Discussion + worksheet
01

Learning Goals

5 min

By the end of this lesson you can:

  • Explain reinforcement learning as learning by trying and rewards.
  • Explain the difference between a reward and a penalty.
  • Describe how a game AI improves over many tries.
02

Warm-Up · How Did You Get Good?

8 min

Last lesson we met the three ways to learn. Today we zoom in on learning by reward.

Think of a game you're good at. Did someone hand you the answers, or did you get better by playing and watching your score?

Reveal the thinking

Almost always by playing. You tried things, the score told you what worked, and you improved. That is exactly reinforcement learning.

03

New Concept · Try, Score, Improve

18 min

What is reinforcement learning?

An AI (we call it the agent) takes an action, then gets a reward for a good result or a penalty for a bad one. It repeats this thousands of times, slowly learning which actions earn the most reward.

It's like training a puppy with treats, or chasing a high score — good moves get a "yes", bad moves get a "no".

Agent (AI)Rewardor penaltyaction →← learn & try again
The agent acts, gets a reward or penalty, and tries again — better each time.

Why practice matters

One try teaches little. But after millions of tries, the AI discovers clever strategies — sometimes ones no human taught it.

Why it matters

This is how game bots, self-driving practice, and walking robots learn. They get good by doing, not by being told every answer.

04

Worked Example · An AI Learns a Maze

18 min

We want an AI to find its way out of a maze. We set the rewards:

  • +10 for reaching the exit.
  • −1 for bumping into a wall.
  • −1 for each step (so it learns to be quick).

What happens over time

  • Early tries: it wanders, hits walls, scores badly.
  • Middle: it starts avoiding walls — those penalties taught it.
  • Later: it finds the shortest path and scores high every time.
Careful — rewards shape behaviour

If we forgot the "−1 per step" reward, the AI might find the exit but wander slowly. The rewards you choose decide what it learns — so choose them wisely.

05

Try It Yourself

20 min

Use your worksheet.

01 🟢 Set the rewards

Design rewards and penalties for an AI learning to play "keep the ball up". What earns points? What loses them?

Hint

+1 for each successful tap, big penalty for letting the ball drop.

02 🟡 Why penalties?

Explain in two sentences why an AI needs penalties, not just rewards, to learn well.

Hint

Without a "no", how would it learn which actions to stop doing?

06

Mini-Challenge · Reward a Fruit-Catcher

12 min

Design a reward system to teach an AI to play "catch the fruit" — a basket moves left and right to catch falling fruit.

List your rewards and penalties, and predict how the AI's play would change from try 1 to try 1,000.

It works if your rewards clearly push the AI toward catching fruit and away from missing.

Show one good design

Rewards: +5 catch a fruit, −3 miss a fruit, −1 for jerky moves. Early: random sliding, lots of misses. Later: the basket waits under each fruit and catches most of them.

07

Recap

5 min

Reinforcement learning is learning by trying. An agent acts, gets a reward or penalty, and improves over many tries. The rewards we choose shape exactly what the AI learns.

Vocabulary Card

reinforcement learning
Learning by trying actions and getting rewards or penalties.
agent
The AI that takes actions and learns — like the player in a game.
reward / penalty
Points the AI gains for good moves or loses for bad ones.
08

Homework · My Practice Reward

≤ 20 min

Pick a skill you learned by practising — cycling, a game, a sport, a musical instrument. Describe the "rewards" and "penalties" that helped you improve (what felt like a win, what felt like a fail).