Learning Goals
5 minBy the end of this lesson you can:
- Explain reinforcement learning as learning by trying and rewards.
- Explain the difference between a reward and a penalty.
- Describe how a game AI improves over many tries.
Warm-Up · How Did You Get Good?
8 minLast lesson we met the three ways to learn. Today we zoom in on learning by reward.
Think of a game you're good at. Did someone hand you the answers, or did you get better by playing and watching your score?
Reveal the thinking
Almost always by playing. You tried things, the score told you what worked, and you improved. That is exactly reinforcement learning.
New Concept · Try, Score, Improve
18 minWhat is reinforcement learning?
An AI (we call it the agent) takes an action, then gets a reward for a good result or a penalty for a bad one. It repeats this thousands of times, slowly learning which actions earn the most reward.
It's like training a puppy with treats, or chasing a high score — good moves get a "yes", bad moves get a "no".
Why practice matters
One try teaches little. But after millions of tries, the AI discovers clever strategies — sometimes ones no human taught it.
This is how game bots, self-driving practice, and walking robots learn. They get good by doing, not by being told every answer.
Worked Example · An AI Learns a Maze
18 minWe want an AI to find its way out of a maze. We set the rewards:
- +10 for reaching the exit.
- −1 for bumping into a wall.
- −1 for each step (so it learns to be quick).
What happens over time
- Early tries: it wanders, hits walls, scores badly.
- Middle: it starts avoiding walls — those penalties taught it.
- Later: it finds the shortest path and scores high every time.
If we forgot the "−1 per step" reward, the AI might find the exit but wander slowly. The rewards you choose decide what it learns — so choose them wisely.
Try It Yourself
20 minUse your worksheet.
Design rewards and penalties for an AI learning to play "keep the ball up". What earns points? What loses them?
Hint
+1 for each successful tap, big penalty for letting the ball drop.
Explain in two sentences why an AI needs penalties, not just rewards, to learn well.
Hint
Without a "no", how would it learn which actions to stop doing?
Mini-Challenge · Reward a Fruit-Catcher
12 minDesign a reward system to teach an AI to play "catch the fruit" — a basket moves left and right to catch falling fruit.
List your rewards and penalties, and predict how the AI's play would change from try 1 to try 1,000.
It works if your rewards clearly push the AI toward catching fruit and away from missing.
Show one good design
Rewards: +5 catch a fruit, −3 miss a fruit, −1 for jerky moves. Early: random sliding, lots of misses. Later: the basket waits under each fruit and catches most of them.
Recap
5 minReinforcement learning is learning by trying. An agent acts, gets a reward or penalty, and improves over many tries. The rewards we choose shape exactly what the AI learns.
Vocabulary Card
- reinforcement learning
- Learning by trying actions and getting rewards or penalties.
- agent
- The AI that takes actions and learns — like the player in a game.
- reward / penalty
- Points the AI gains for good moves or loses for bad ones.
Homework · My Practice Reward
≤ 20 minPick a skill you learned by practising — cycling, a game, a sport, a musical instrument. Describe the "rewards" and "penalties" that helped you improve (what felt like a win, what felt like a fail).
Sample · Learning to Cycle
Rewards: staying upright, moving forward smoothly, my dad cheering.
Penalties: wobbling, falling, scraped knee.
After many tries, my brain learned the actions that earned the "rewards" — just like a reinforcement-learning agent.
Yours will be different — any real skill with its wins and fails is perfect.