Spec & Goals 3 min
AQA Spec 3.3.8 · Data compression
By the end of this lesson you can:
- Explain why files are compressed and the difference between lossless and lossy compression.
- Use run-length encoding (RLE) to compress a short run of data.
- Describe how Huffman coding reduces a file's size.
Warm-Up 5 min
Last lesson you calculated sound file sizes — a few seconds of audio is millions of bits. Big files are slow to send and fill storage. Compression fixes that.
Quick starter
A friend in Johor sends you the text AAAAAAAA (eight A's). How could you write the same message in far fewer characters?
Reveal the idea
Write 8A — "the letter A, eight times". Storing a value plus a count is the heart of run-length encoding.
Key Concept — making files smaller 14 min
Compression reduces the number of bits a file uses. Smaller files need less storage and transfer or download faster.
There are two kinds.
| Lossless | Lossy | |
|---|---|---|
| What happens | No data is lost | Some data is permanently removed |
| Original recovered? | Yes — perfectly reconstructed | No — cannot be fully recovered |
| File size | Smaller | Much smaller |
| Examples | RLE, Huffman coding | JPEG images, MP3 audio |
Run-length encoding (RLE)
RLE stores a run of identical values as the value plus a count, instead of repeating it.
Huffman coding
Huffman coding gives the most frequent characters the shortest binary codes and rare characters longer codes. Common letters then cost fewer bits, so the whole file shrinks.
A gets the 1-bit code 0; rarer B and C get 2-bit codes 10 and 11.Worked Example — RLE and Huffman savings 12 min
Part A — RLE. Compress the run AAAAABBBBCCCCCC using run-length encoding.
| Run | Value | Count | Encoded |
|---|---|---|---|
| AAAAA | A | 5 | 5A |
| BBBB | B | 4 | 4B |
| CCCCCC | C | 6 | 6C |
So AAAAABBBBCCCCCC (15 characters) becomes 5A 4B 6C. RLE only helps because the data has long runs of the same value.
Part B — Huffman. Take the message AAAB using the tree above (A = 0, B = 10).
| Method | Working | Bits used |
|---|---|---|
| Fixed 2-bit codes | 4 characters × 2 bits | 8 bits |
| Huffman codes | 0 + 0 + 0 + 10 (three A's + one B) | 5 bits |
| Saving | 8 − 5 | 3 bits saved |
The frequent A uses just 1 bit each, so the message shrinks from 8 bits to 5 bits.
Try It Yourself 12 min
Goal: State one reason a company might compress files before sending them to customers.
Hint: think about storage space or download/transfer time.
Goal: Use run-length encoding to compress WWWWWWBBWWWW.
Hint: write each run as a count followed by the value.
Goal: Explain why RLE is a poor choice for the text COMPUTER, then suggest a situation where RLE works well.
📝 Exam Practice 10 min
Answer the way the examiner expects — the command word and the marks tell you how much to write.
State the difference between lossy and lossless compression.
Mark scheme
- Lossy permanently removes / loses some data (so the original cannot be fully recovered) (1).
- Lossless loses no data / the original can be perfectly reconstructed (1).
Use run-length encoding to compress AAAABBBAAAA.
Mark scheme
- Runs identified in order: 4 A's, 3 B's, 4 A's (1).
4A 3B 4A(1).
Explain how Huffman coding reduces the size of a file.
Mark scheme
- The most frequent / common characters are given the shortest (binary) codes (1).
- So fewer bits are needed overall / less frequent characters get longer codes (1).
Recap & Key Terms 3 min
Compression makes files smaller. Lossless (RLE, Huffman) loses no data and is perfectly reversible; lossy (JPEG, MP3) discards data for a much smaller file. RLE stores runs as a value and a count; Huffman gives frequent characters the shortest codes.
- Compression
- Reducing the size of a file so it uses less storage and transfers more quickly.
- Lossless
- Compression where no data is lost; the original can be perfectly reconstructed.
- Lossy
- Compression where some data is permanently removed to make the file much smaller.
- Run-length encoding
- A lossless method that stores each run of repeated values as the value and a count.
- Huffman coding
- A lossless method that gives frequent characters short codes and rare characters longer codes.
Homework 1 min
Task (≤ 15 min): Use run-length encoding to compress GGGGGRRRRRRGG, then state whether RLE was worthwhile here and why.
Model answer
Runs: 5 G's, 6 R's, 2 G's → 5G 6R 2G. It was worthwhile because the data has long runs of the same value, so it takes fewer characters than the original 13.
Award marks for: correct encoded output (1), correct judgement with a reason about long runs (1).