Data Compression — AQA GCSE CS (8525)

Spec & Goals 3 min

AQA Spec 3.3.8 · Data compression

By the end of this lesson you can:

Explain why files are compressed and the difference between lossless and lossy compression.
Use run-length encoding (RLE) to compress a short run of data.
Describe how Huffman coding reduces a file's size.

Warm-Up 5 min

Last lesson you calculated sound file sizes — a few seconds of audio is millions of bits. Big files are slow to send and fill storage. Compression fixes that.

Quick starter

A friend in Johor sends you the text AAAAAAAA (eight A's). How could you write the same message in far fewer characters?

Reveal the idea

Write 8A — "the letter A, eight times". Storing a value plus a count is the heart of run-length encoding.

Key Concept — making files smaller 14 min

Compression reduces the number of bits a file uses. Smaller files need less storage and transfer or download faster.

There are two kinds.

	Lossless	Lossy
What happens	No data is lost	Some data is permanently removed
Original recovered?	Yes — perfectly reconstructed	No — cannot be fully recovered
File size	Smaller	Much smaller
Examples	RLE, Huffman coding	JPEG images, MP3 audio

Run-length encoding (RLE)

RLE stores a run of identical values as the value plus a count, instead of repeating it.

Common pitfall: RLE only saves space when there are long runs. Data with no repeats (like ABCDE) can get bigger.

Huffman coding

Huffman coding gives the most frequent characters the shortest binary codes and rare characters longer codes. Common letters then cost fewer bits, so the whole file shrinks.

The frequent letter A gets the 1-bit code 0; rarer B and C get 2-bit codes 10 and 11.

Worked Example — RLE and Huffman savings 12 min

Part A — RLE. Compress the run AAAAABBBBCCCCCC using run-length encoding.

Run	Value	Count	Encoded
AAAAA	A	5	5A
BBBB	B	4	4B
CCCCCC	C	6	6C

So AAAAABBBBCCCCCC (15 characters) becomes 5A 4B 6C. RLE only helps because the data has long runs of the same value.

Part B — Huffman. Take the message AAAB using the tree above (A = 0, B = 10).

Method	Working	Bits used
Fixed 2-bit codes	4 characters × 2 bits	8 bits
Huffman codes	0 + 0 + 0 + 10 (three A's + one B)	5 bits
Saving	8 − 5	3 bits saved

The frequent A uses just 1 bit each, so the message shrinks from 8 bits to 5 bits.

Try It Yourself 12 min

🟢 Easy

Goal: State one reason a company might compress files before sending them to customers.

Hint: think about storage space or download/transfer time.

🟡 Medium

Goal: Use run-length encoding to compress WWWWWWBBWWWW.

Hint: write each run as a count followed by the value.

🔴 Stretch

Goal: Explain why RLE is a poor choice for the text COMPUTER, then suggest a situation where RLE works well.

📝 Exam Practice 10 min

Answer the way the examiner expects — the command word and the marks tell you how much to write.

State[2 marks]

State the difference between lossy and lossless compression.

Mark scheme

Lossy permanently removes / loses some data (so the original cannot be fully recovered) (1).
Lossless loses no data / the original can be perfectly reconstructed (1).

Calculate[2 marks]

Use run-length encoding to compress AAAABBBAAAA.

Mark scheme

Runs identified in order: 4 A's, 3 B's, 4 A's (1).
4A 3B 4A (1).

Explain[2 marks]

Explain how Huffman coding reduces the size of a file.

Mark scheme

The most frequent / common characters are given the shortest (binary) codes (1).
So fewer bits are needed overall / less frequent characters get longer codes (1).

Recap & Key Terms 3 min

Compression makes files smaller. Lossless (RLE, Huffman) loses no data and is perfectly reversible; lossy (JPEG, MP3) discards data for a much smaller file. RLE stores runs as a value and a count; Huffman gives frequent characters the shortest codes.

Compression: Reducing the size of a file so it uses less storage and transfers more quickly.
Lossless: Compression where no data is lost; the original can be perfectly reconstructed.
Lossy: Compression where some data is permanently removed to make the file much smaller.
Run-length encoding: A lossless method that stores each run of repeated values as the value and a count.
Huffman coding: A lossless method that gives frequent characters short codes and rare characters longer codes.

Homework 1 min

Task (≤ 15 min): Use run-length encoding to compress GGGGGRRRRRRGG, then state whether RLE was worthwhile here and why.

Model answer

Runs: 5 G's, 6 R's, 2 G's → 5G 6R 2G. It was worthwhile because the data has long runs of the same value, so it takes fewer characters than the original 13.

Award marks for: correct encoded output (1), correct judgement with a reason about long runs (1).

Spec & Goals 3 min

AQA Spec 3.3.8 · Data compression

By the end of this lesson you can:

Explain why files are compressed and the difference between lossless and lossy compression.
Use run-length encoding (RLE) to compress a short run of data.
Describe how Huffman coding reduces a file's size.

Warm-Up 5 min

Last lesson you calculated sound file sizes — a few seconds of audio is millions of bits. Big files are slow to send and fill storage. Compression fixes that.

Quick starter

A friend in Johor sends you the text AAAAAAAA (eight A's). How could you write the same message in far fewer characters?

Reveal the idea

Write 8A — "the letter A, eight times". Storing a value plus a count is the heart of run-length encoding.

Key Concept — making files smaller 14 min

Compression reduces the number of bits a file uses. Smaller files need less storage and transfer or download faster.

There are two kinds.

	Lossless	Lossy
What happens	No data is lost	Some data is permanently removed
Original recovered?	Yes — perfectly reconstructed	No — cannot be fully recovered
File size	Smaller	Much smaller
Examples	RLE, Huffman coding	JPEG images, MP3 audio

Run-length encoding (RLE)

RLE stores a run of identical values as the value plus a count, instead of repeating it.

Common pitfall: RLE only saves space when there are long runs. Data with no repeats (like ABCDE) can get bigger.

Huffman coding

Huffman coding gives the most frequent characters the shortest binary codes and rare characters longer codes. Common letters then cost fewer bits, so the whole file shrinks.

The frequent letter A gets the 1-bit code 0; rarer B and C get 2-bit codes 10 and 11.

Worked Example — RLE and Huffman savings 12 min

Part A — RLE. Compress the run AAAAABBBBCCCCCC using run-length encoding.

Run	Value	Count	Encoded
AAAAA	A	5	5A
BBBB	B	4	4B
CCCCCC	C	6	6C

So AAAAABBBBCCCCCC (15 characters) becomes 5A 4B 6C. RLE only helps because the data has long runs of the same value.

Part B — Huffman. Take the message AAAB using the tree above (A = 0, B = 10).

Method	Working	Bits used
Fixed 2-bit codes	4 characters × 2 bits	8 bits
Huffman codes	0 + 0 + 0 + 10 (three A's + one B)	5 bits
Saving	8 − 5	3 bits saved

The frequent A uses just 1 bit each, so the message shrinks from 8 bits to 5 bits.

Try It Yourself 12 min

🟢 Easy

Goal: State one reason a company might compress files before sending them to customers.

Hint: think about storage space or download/transfer time.

🟡 Medium

Goal: Use run-length encoding to compress WWWWWWBBWWWW.

Hint: write each run as a count followed by the value.

🔴 Stretch

Goal: Explain why RLE is a poor choice for the text COMPUTER, then suggest a situation where RLE works well.

📝 Exam Practice 10 min

Answer the way the examiner expects — the command word and the marks tell you how much to write.

State[2 marks]

State the difference between lossy and lossless compression.

Mark scheme

Lossy permanently removes / loses some data (so the original cannot be fully recovered) (1).
Lossless loses no data / the original can be perfectly reconstructed (1).

Calculate[2 marks]

Use run-length encoding to compress AAAABBBAAAA.

Mark scheme

Runs identified in order: 4 A's, 3 B's, 4 A's (1).
4A 3B 4A (1).

Explain[2 marks]

Explain how Huffman coding reduces the size of a file.

Mark scheme

The most frequent / common characters are given the shortest (binary) codes (1).
So fewer bits are needed overall / less frequent characters get longer codes (1).

Recap & Key Terms 3 min

Compression: Reducing the size of a file so it uses less storage and transfers more quickly.
Lossless: Compression where no data is lost; the original can be perfectly reconstructed.
Lossy: Compression where some data is permanently removed to make the file much smaller.
Run-length encoding: A lossless method that stores each run of repeated values as the value and a count.
Huffman coding: A lossless method that gives frequent characters short codes and rare characters longer codes.

Homework 1 min

Task (≤ 15 min): Use run-length encoding to compress GGGGGRRRRRRGG, then state whether RLE was worthwhile here and why.

Model answer

Runs: 5 G's, 6 R's, 2 G's → 5G 6R 2G. It was worthwhile because the data has long runs of the same value, so it takes fewer characters than the original 13.

Award marks for: correct encoded output (1), correct judgement with a reason about long runs (1).