Spec & Goals 3 min
AQA Spec 3.3.5 · Character encoding
By the end of this lesson you can:
- Explain how a character set maps characters to binary character codes.
- State that ASCII uses 7 bits and describe why Unicode is needed.
- Use the fact that character codes are sequential to do letter arithmetic.
Warm-Up 5 min
Last lesson you shifted bits left and right to multiply and divide by powers of two.
Quick starter
A computer can only store binary. So how does it store the letter A in a message such as 'Ali'?
Reveal the idea
Each character is given a number, and that number is stored in binary. The letter A has the agreed number 65, stored as 1000001. This lesson is about those agreed numbers.
Key Concept — character sets and codes 14 min
A computer stores text by giving every character an agreed number. The number is then stored in binary.
ASCII
ASCII uses 7 bits per character. With 7 bits there are 27 = 128 different characters.
These 128 codes cover the upper-case and lower-case English letters, the digits, punctuation and some control codes.
Extended ASCII uses 8 bits, giving 28 = 256 characters — room for a few extra symbols.
Unicode
Unicode uses more bits per character. This lets it represent characters from all the world's languages, plus symbols and emoji.
Codes are ordered (sequential)
The codes follow a clear order. The letters run one after another, and so do the digits.
| Character | ASCII code (denary) |
|---|---|
'0' | 48 |
'9' | 57 |
'A' | 65 |
'B' | 66 |
'Z' | 90 |
'a' | 97 |
'b' | 98 |
Worked Example — letter arithmetic 12 min
Problem: given 'A' = 65, work out the code for 'E', then explain the gap between 'A' and 'a'.
Step 1 — count the gap. E is the 5th letter and A is the 1st, so E is 4 letters after A.
| Letter | A | B | C | D | E |
|---|---|---|---|---|---|
| Code | 65 | 66 | 67 | 68 | 69 |
Step 2 — add the gap. 65 + 4 = 69. So 'E' = 69. ✔
Step 3 — the gap to lower case. 'a' = 97 and 'A' = 65.
| Character | Code |
|---|---|
'A' (upper) | 65 |
'a' (lower) | 97 |
| Difference | 97 − 65 = 32 |
Every lower-case letter is exactly 32 more than its upper-case partner. So 'e' = 69 + 32 = 101.
Try It Yourself 12 min
Goal: Given 'A' = 65, state the character code for 'C'.
Hint: count how many letters C is after A, then add that to 65.
Goal: Given 'a' = 97, work out the code for the lower-case letter 'd'.
Hint: d is 3 letters after a.
Goal: Explain why the message 'Selamat' needs Unicode if it also contains an emoji, but plain English letters alone fit in ASCII.
📝 Exam Practice 10 min
Answer the way the examiner expects — the command word and the marks tell you how much to write.
State how many bits standard ASCII uses to represent each character.
Mark scheme
- 7 (bits) (1).
Given that 'A' = 65, state the character code for 'D'.
Mark scheme
- 68 (1) —
Dis 3 letters afterA, so 65 + 3 = 68.
Give one advantage of Unicode over ASCII.
Mark scheme
- It can represent far more characters (1) / it can represent characters from all (the world's) languages / it can represent emoji (1). Accept any one.
Character codes are described as sequential. Explain what this means and give one benefit.
Mark scheme
- The codes follow a fixed order, e.g.
'A'=65,'B'=66,'C'=67 (1). - So you can do letter arithmetic / sort or compare letters / convert case easily (1).
Recap & Key Terms 3 min
A character set maps each character to a binary character code. ASCII uses 7 bits (128 characters); Unicode uses more bits so it covers every language and emoji. Codes are sequential, so letter arithmetic works.
- Character set
- The agreed list that maps each character to a binary code.
- Character code
- The binary number used to represent one character.
- ASCII
- A 7-bit character set holding 128 characters (extended ASCII uses 8 bits for 256).
- Unicode
- A character set using more bits than ASCII, able to represent all languages and emoji.
Homework 1 min
Task (≤ 15 min): Given 'A' = 65 and that lower case is 32 higher, work out the codes for 'H' and 'h'.
Model answer
H is 7 letters after A: 65 + 7 = 72 (1). Lower case is 32 higher: 72 + 32 = 104 for 'h' (1).