Learning Goals
3 minBy the end of this lesson you can:
- Write the four shapes:
[f(x) for x in seq], with filter, with conditional expression, with nested loop. - Convert a 4-line for-loop to a one-line comprehension when appropriate.
- Choose when NOT to use a comprehension (multi-line logic, side effects).
- Recognise generator expressions
(...)— the lazy cousin.
Warm-Up · Recap from PY-L2-02
5 min# Loop doubles = [] for n in [1, 2, 3, 4, 5]: doubles.append(n * 2) # Comprehension — same result, one line doubles = [n * 2 for n in [1, 2, 3, 4, 5]] print(doubles) # [2, 4, 6, 8, 10]
Same output. The comprehension reads "the doubles for n in this list". Today we add filters and nesting.
A comprehension is "what to build", "where from", and optionally "under what condition". All in one expression. Mastering them is the moment your Python starts looking professional.
New Concept · The Four Shapes
14 minShape 1 — Basic transform
squares = [n * n for n in range(1, 6)] # [1, 4, 9, 16, 25] upper = [w.upper() for w in ["hi", "there"]] # ['HI', 'THERE']
Shape 2 — Filter with if at the end
evens = [n for n in range(1, 11) if n % 2 == 0] # [2, 4, 6, 8, 10] long_words = [w for w in ["hi", "python", "is", "great"] if len(w) > 3] # ['python', 'great']
The if at the end filters — keep only items where the condition is True.
Shape 3 — Conditional expression in the value
# "pass" or "fail" for each score scores = [82, 47, 95, 60, 38] labels = ["pass" if s >= 50 else "fail" for s in scores] # ['pass', 'fail', 'pass', 'pass', 'fail']
This if/else goes at the front (in the value expression). It's a different role from the filter — it transforms each item rather than dropping any.
Filter + transform combined
# Square only the even numbers result = [n * n for n in range(1, 11) if n % 2 == 0] # [4, 16, 36, 64, 100]
One transform on the left. One filter on the right. Both in one comprehension.
Shape 4 — Nested loops
# All (x, y) pairs where x in 1..3 and y in 1..3 pairs = [(x, y) for x in range(1, 4) for y in range(1, 4)] # [(1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3)]
Read left to right: outer loop first, inner second. Same order as a regular nested for-loop.
# Multiplication table — but only show non-trivial entries times = [f"{x}*{y}={x*y}" for x in range(2, 4) for y in range(2, 4) if x != y] # ['2*3=6', '3*2=6']
You can mix filters with nested loops. Filters at the end apply at the matching loop level (the last for).
When NOT to use a comprehension
AVOID a comprehension when: - the logic doesn't fit one line - you need print() or another side effect - the filter has multiple complex conditions - a colleague would have to slow down to read it In those cases, just use a for-loop.
Generator expressions · the lazy cousin
Round brackets instead of square. Doesn't build a list — yields items one by one when iterated. Useful when feeding into sum, max, etc., because no intermediate list is allocated.
# Compare list_total = sum([n * n for n in range(1, 1_000_000)]) # builds a million-item list, then sums gen_total = sum(n * n for n in range(1, 1_000_000)) # streams the squares, never stores them # Same answer. Generator is way more memory-efficient.
For sum/max/min/any/all over big sequences, prefer the generator expression. For something you need to keep around (like the squares list), build the list.
Worked Example · The Email Cleaner
12 minReal text-cleaning pipeline. Save as email_clean.py:
# email_clean.py — multiple comprehensions in a pipeline raw_lines = """ aisyah@example.com WEI.JIE@school.edu.my not_an_email priya123@gmail.com @bad.address iman.ahmad@school.edu.my PRIYA123@gmail.com """.strip().splitlines() import re EMAIL_PAT = re.compile(r"^[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}$") # Pipeline as comprehensions stripped = [line.strip() for line in raw_lines] lowered = [line.lower() for line in stripped if line] valid = [line for line in lowered if EMAIL_PAT.search(line)] unique = list(dict.fromkeys(valid)) # de-dupe preserving order print("Cleaned emails:") for e in unique: print(f" {e}") print(f"\n{len(raw_lines)} raw → {len(unique)} unique valid.") # Domains-only — second comprehension over the same list domains = [e.split("@")[1] for e in unique] print(f"Domains: {sorted(set(domains))}")
Output
Cleaned emails: aisyah@example.com wei.jie@school.edu.my priya123@gmail.com iman.ahmad@school.edu.my 7 raw → 4 unique valid. Domains: ['example.com', 'gmail.com', 'school.edu.my']
Read the diff
Four comprehensions in one pipeline — each does one job. Strip whitespace. Lower-case (and drop empties). Filter by regex. Project to domains. dict.fromkeys(valid) is a 3.7+ Pythonic way to dedupe while preserving order. Each line is short, focused, and reads as one operation.
Try It Yourself
13 minFrom 1..20, build a list of the squares of just the even numbers.
Hint
squares = [n * n for n in range(1, 21) if n % 2 == 0] print(squares)
Given [82, 47, 95, 60, 38, 50], produce ["pass", "fail", "pass", "pass", "fail", "pass"].
Hint
scores = [82, 47, 95, 60, 38, 50] labels = ["pass" if s >= 50 else "fail" for s in scores] print(labels)
Inline if/else in the value position (not the filter at the end). Transforms each item; drops nothing.
Find all Pythagorean triples (a, b, c) where a + b + c ≤ 30, using a triple-nested comprehension. The condition is a² + b² == c².
Hint
triples = [(a, b, c) for c in range(1, 30) for b in range(1, c) for a in range(1, b) if a * a + b * b == c * c and a + b + c <= 30] print(triples) # → [(3, 4, 5), (6, 8, 10), (5, 12, 13)]
The bounded ordering (a < b < c) avoids duplicates. The filter is at the bottom — applies after all three loops have picked values.
Mini-Challenge · The Grid
8 minBuild a 5×5 grid of (row, col) tuples using a comprehension. Then flatten a 2-D list of lists using a nested comprehension. Then build a list of just the diagonal cells.
Show one possible solution
# 1 — build a 5x5 grid grid = [(r, c) for r in range(5) for c in range(5)] print(len(grid)) # 25 # 2 — flatten a 2-D list matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] flat = [x for row in matrix for x in row] print(flat) # [1, 2, 3, 4, 5, 6, 7, 8, 9] # 3 — diagonal cells only diag = [matrix[i][i] for i in range(len(matrix))] print(diag) # [1, 5, 9]
Non-negotiables: the 5×5 grid uses two for-clauses; flatten uses outer/inner naming clearly; the diagonal uses the index trick. The flatten shape is something you'll write again and again — burn it in.
Recap
3 minFour comprehension shapes — transform, filter, conditional-expression-in-value, nested. The if at the end filters; the if/else in the value transforms. Nested for-clauses give Cartesian products and flattening. Generator expressions (round brackets) are the lazy cousin — same syntax, no intermediate list. Don't force a comprehension when a regular loop would be clearer.
Vocabulary Card
- filter clause
- The
if condat the end. Keep only matching items. - conditional expression
x if cond else yin the value position. Transforms each item; doesn't drop any.- nested for-clauses
- Two or more
forclauses. Outer-first. - generator expression
- Same syntax with
(...). Yields one item at a time. Memory-efficient for big sequences.
Homework
4 minBuild cleanup.py. Given:
raw = [" Aisyah ", "WEI JIE", "", " ", "priya", "IMAN", "", "aizat "]
Build, all using comprehensions:
cleaned— strip and lower-case each, drop empties.capitalised— title-case each name.long_names— only those of length ≥ 5.upper_or_pad— UPPER for short names (<5), normal for the rest.
Sample · cleanup.py
raw = [" Aisyah ", "WEI JIE", "", " ", "priya", "IMAN", "", "aizat "] cleaned = [name.strip().lower() for name in raw if name.strip()] capitalised = [n.title() for n in cleaned] long_names = [n for n in capitalised if len(n) >= 5] upper_or_pad= [n.upper() if len(n) < 5 else n for n in capitalised] print("Cleaned :", cleaned) print("Capitalised:", capitalised) print("Long >= 5 :", long_names) print("Mixed :", upper_or_pad)
Non-negotiables: each transformation is one comprehension, the filter at the end drops empties, and the conditional expression in #4 transforms without dropping.