Syllabus & Goals 3 min
Cambridge 1.3 · Data storage and file compression Paper 1 · Computer Systems
By the end of this lesson you can:
- Explain why files are compressed.
- Describe the difference between lossy and lossless compression.
- Carry out run-length encoding (RLE) on text and images.
Recap / Warm-Up 5 min
Last lesson you saw that a single minute of CD audio is hundreds of MiB. Sending that is slow — so we shrink files first.
Quick starter
How could you write the run WWWWWWWW (eight W's) more briefly, without losing any information?
Reveal the answer
Write 8W — "eight W's". That single idea is run-length encoding, and nothing is lost.
Key Concept 14 min
1 · Why compress?
Compression reduces a file's size. We do it to:
- save storage space on a drive or in the cloud;
- reduce the time to stream, download or upload a file;
- use less bandwidth — so transfer is faster;
- lower costs (cloud storage and data are charged by size).
2 · Lossy vs lossless
Lossy gives the smallest files, so it suits music, photos and video where a little lost detail is acceptable. Lossless is essential where every bit matters — text, spreadsheets and program files.
3 · Run-length encoding (RLE)
RLE is a lossless method that replaces a run of identical items with two values: how many, then which item. It only helps when there are long runs.
Worked Example 12 min
(a) RLE on text: aaaaabbbbccddddd
Count each run, then give its character (here shown with the ASCII code, as an exam often asks):
| Run | Count | Character (ASCII) |
|---|---|---|
| aaaaa | 05 | a (97) |
| bbbb | 04 | b (98) |
| cc | 02 | c (99) |
| ddddd | 05 | d (100) |
Coded: 05 97 04 98 02 99 05 100. The original 16 bytes become 8 bytes — half the size, with nothing lost.
(b) RLE on a black-and-white image row
For an image, white = 1 and black = 0. The row 1 1 1 1 1 1 0 0 (six white, two black) becomes:
6 1 2 0 — "six 1s, two 0s". Four values instead of eight.
RLE as an algorithm
Cambridge pseudocode
// Run-length encode a string: output each run as (count, character)
DECLARE Text : STRING
DECLARE Index, RunLength : INTEGER
DECLARE Current : CHAR
Text ← "aaaaabbbbccddddd"
Index ← 1
WHILE Index <= LENGTH(Text)
Current ← MID(Text, Index, 1)
RunLength ← 0
WHILE Index <= LENGTH(Text) AND MID(Text, Index, 1) = Current
RunLength ← RunLength + 1
Index ← Index + 1
ENDWHILE
OUTPUT RunLength, Current
ENDWHILEThe same algorithm in Python (IDLE)
# Run-length encode a string text = "aaaaabbbbccddddd" result = [] i = 0 while i < len(text): current = text[i] count = 0 while i < len(text) and text[i] == current: count = count + 1 i = i + 1 result.append((count, current)) print(result)
Output
[(5, 'a'), (4, 'b'), (2, 'c'), (5, 'd')]
Try It Yourself 12 min
Goal: Use RLE to compress the run WWWWWWBBB (use W and B, not ASCII codes).
Hint: count each run, then write its letter.
Goal: A file must keep every bit (it is a bank spreadsheet). State whether lossy or lossless compression should be used, and why.
Goal: Explain why RLE works very well on a plain two-colour logo but poorly on a detailed photograph.
📝 Exam Practice 10 min
State one reason why a file might be compressed before being emailed.
Mark scheme
- Any one of: smaller file / faster to send / uses less bandwidth / uses less storage (1).
Compare lossy and lossless compression.
Mark scheme
- Lossy permanently removes data so the original cannot be restored (1).
- Lossless keeps all data so the original can be fully restored (1).
An image row is 0 0 0 0 0 1 1 1 (black = 0, white = 1). Show how RLE would compress it.
Mark scheme
- Five 0s then three 1s identified (1).
- Coded as
5 0 3 1(1).
Recap & Key Terms 3 min
We compress files to save space, speed up transfer and cut cost. Lossy (MP3, JPEG) removes data permanently for the smallest files; lossless (RLE) keeps every bit. RLE replaces each run with a count and a value. That completes Unit 1 — Data representation.
- Compression
- Reducing a file's size by removing repeated or redundant data.
- Lossy compression
- Some data is permanently removed; the original cannot be fully rebuilt (MP3, MP4, JPEG).
- Lossless compression
- No data is lost; the original can be fully rebuilt (e.g. RLE).
- Run-length encoding (RLE)
- A lossless method that stores each run as a count followed by the data value.
Homework 1 min
Task (≤ 15 min): (a) Compress aaaabbbbbbcc with RLE using ASCII codes (a = 97, b = 98, c = 99). (b) State whether RLE is lossy or lossless.
Model answer
- (a)
04 97 06 98 02 99— four a's, six b's, two c's. - (b) Lossless — the original string can be rebuilt exactly.