AQA GCSE CSPaper 2 · Unit 3Lesson 7

Paper 2 · Unit 3 · CS-L3-07

Character Encoding

60 minutes · AQA 8525 · Paper 2 — Fundamentals of data representation

Spec & Goals 3 min

AQA Spec 3.3.5 · Character encoding

By the end of this lesson you can:

  1. Explain how a character set maps characters to binary character codes.
  2. State that ASCII uses 7 bits and describe why Unicode is needed.
  3. Use the fact that character codes are sequential to do letter arithmetic.

Warm-Up 5 min

Last lesson you shifted bits left and right to multiply and divide by powers of two.

Quick starter

A computer can only store binary. So how does it store the letter A in a message such as 'Ali'?

Reveal the idea

Each character is given a number, and that number is stored in binary. The letter A has the agreed number 65, stored as 1000001. This lesson is about those agreed numbers.

Key Concept — character sets and codes 14 min

A computer stores text by giving every character an agreed number. The number is then stored in binary.

ASCII

ASCII uses 7 bits per character. With 7 bits there are 27 = 128 different characters.

These 128 codes cover the upper-case and lower-case English letters, the digits, punctuation and some control codes.

Extended ASCII uses 8 bits, giving 28 = 256 characters — room for a few extra symbols.

Unicode

Unicode uses more bits per character. This lets it represent characters from all the world's languages, plus symbols and emoji.

Codes are ordered (sequential)

The codes follow a clear order. The letters run one after another, and so do the digits.

CharacterASCII code (denary)
'0'48
'9'57
'A'65
'B'66
'Z'90
'a'97
'b'98

Worked Example — letter arithmetic 12 min

Problem: given 'A' = 65, work out the code for 'E', then explain the gap between 'A' and 'a'.

Step 1 — count the gap. E is the 5th letter and A is the 1st, so E is 4 letters after A.

LetterABCDE
Code6566676869

Step 2 — add the gap. 65 + 4 = 69. So 'E' = 69. ✔

Step 3 — the gap to lower case. 'a' = 97 and 'A' = 65.

CharacterCode
'A' (upper)65
'a' (lower)97
Difference97 − 65 = 32

Every lower-case letter is exactly 32 more than its upper-case partner. So 'e' = 69 + 32 = 101.

Try It Yourself 12 min

🟢 Easy

Goal: Given 'A' = 65, state the character code for 'C'.

Hint: count how many letters C is after A, then add that to 65.

🟡 Medium

Goal: Given 'a' = 97, work out the code for the lower-case letter 'd'.

Hint: d is 3 letters after a.

🔴 Stretch

Goal: Explain why the message 'Selamat' needs Unicode if it also contains an emoji, but plain English letters alone fit in ASCII.

📝 Exam Practice 10 min

Answer the way the examiner expects — the command word and the marks tell you how much to write.

State[1 mark]

State how many bits standard ASCII uses to represent each character.

Mark scheme
  • 7 (bits) (1).
Calculate[1 mark]

Given that 'A' = 65, state the character code for 'D'.

Mark scheme
  • 68 (1) — D is 3 letters after A, so 65 + 3 = 68.
Give[1 mark]

Give one advantage of Unicode over ASCII.

Mark scheme
  • It can represent far more characters (1) / it can represent characters from all (the world's) languages / it can represent emoji (1). Accept any one.
Explain[2 marks]

Character codes are described as sequential. Explain what this means and give one benefit.

Mark scheme
  • The codes follow a fixed order, e.g. 'A'=65, 'B'=66, 'C'=67 (1).
  • So you can do letter arithmetic / sort or compare letters / convert case easily (1).

Recap & Key Terms 3 min

A character set maps each character to a binary character code. ASCII uses 7 bits (128 characters); Unicode uses more bits so it covers every language and emoji. Codes are sequential, so letter arithmetic works.

Character set
The agreed list that maps each character to a binary code.
Character code
The binary number used to represent one character.
ASCII
A 7-bit character set holding 128 characters (extended ASCII uses 8 bits for 256).
Unicode
A character set using more bits than ASCII, able to represent all languages and emoji.

Homework 1 min

Task (≤ 15 min): Given 'A' = 65 and that lower case is 32 higher, work out the codes for 'H' and 'h'.

Model answer

H is 7 letters after A: 65 + 7 = 72 (1). Lower case is 32 higher: 72 + 32 = 104 for 'h' (1).