PY-L3-21 · List Comprehensions

Learning Goals

3 min

By the end of this lesson you can:

Write the four shapes: [f(x) for x in seq], with filter, with conditional expression, with nested loop.
Convert a 4-line for-loop to a one-line comprehension when appropriate.
Choose when NOT to use a comprehension (multi-line logic, side effects).
Recognise generator expressions (...) — the lazy cousin.

Warm-Up · Recap from PY-L2-02

5 min

# Loop
doubles = []
for n in [1, 2, 3, 4, 5]:
    doubles.append(n * 2)

# Comprehension — same result, one line
doubles = [n * 2 for n in [1, 2, 3, 4, 5]]
print(doubles)         # [2, 4, 6, 8, 10]

Same output. The comprehension reads "the doubles for n in this list". Today we add filters and nesting.

Today's big idea

A comprehension is "what to build", "where from", and optionally "under what condition". All in one expression. Mastering them is the moment your Python starts looking professional.

New Concept · The Four Shapes

14 min

Shape 1 — Basic transform

squares = [n * n for n in range(1, 6)]
# [1, 4, 9, 16, 25]

upper = [w.upper() for w in ["hi", "there"]]
# ['HI', 'THERE']

Shape 2 — Filter with if at the end

evens = [n for n in range(1, 11) if n % 2 == 0]
# [2, 4, 6, 8, 10]

long_words = [w for w in ["hi", "python", "is", "great"] if len(w) > 3]
# ['python', 'great']

The if at the end filters — keep only items where the condition is True.

Shape 3 — Conditional expression in the value

# "pass" or "fail" for each score
scores = [82, 47, 95, 60, 38]
labels = ["pass" if s >= 50 else "fail" for s in scores]
# ['pass', 'fail', 'pass', 'pass', 'fail']

This if/else goes at the front (in the value expression). It's a different role from the filter — it transforms each item rather than dropping any.

Filter + transform combined

# Square only the even numbers
result = [n * n for n in range(1, 11) if n % 2 == 0]
# [4, 16, 36, 64, 100]

One transform on the left. One filter on the right. Both in one comprehension.

Shape 4 — Nested loops

# All (x, y) pairs where x in 1..3 and y in 1..3
pairs = [(x, y) for x in range(1, 4) for y in range(1, 4)]
# [(1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3)]

Read left to right: outer loop first, inner second. Same order as a regular nested for-loop.

# Multiplication table — but only show non-trivial entries
times = [f"{x}*{y}={x*y}" for x in range(2, 4) for y in range(2, 4) if x != y]
# ['2*3=6', '3*2=6']

You can mix filters with nested loops. Filters at the end apply at the matching loop level (the last for).

When NOT to use a comprehension

AVOID a comprehension when:

- the logic doesn't fit one line
- you need print() or another side effect
- the filter has multiple complex conditions
- a colleague would have to slow down to read it

In those cases, just use a for-loop.

Generator expressions · the lazy cousin

Round brackets instead of square. Doesn't build a list — yields items one by one when iterated. Useful when feeding into sum, max, etc., because no intermediate list is allocated.

# Compare
list_total = sum([n * n for n in range(1, 1_000_000)])    # builds a million-item list, then sums
gen_total  = sum(n * n for n in range(1, 1_000_000))      # streams the squares, never stores them

# Same answer. Generator is way more memory-efficient.

For sum/max/min/any/all over big sequences, prefer the generator expression. For something you need to keep around (like the squares list), build the list.

Worked Example · The Email Cleaner

12 min

Real text-cleaning pipeline. Save as email_clean.py:

# email_clean.py — multiple comprehensions in a pipeline

raw_lines = """
  aisyah@example.com
WEI.JIE@school.edu.my
not_an_email
priya123@gmail.com
@bad.address
iman.ahmad@school.edu.my
   PRIYA123@gmail.com
""".strip().splitlines()

import re
EMAIL_PAT = re.compile(r"^[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}$")

# Pipeline as comprehensions
stripped = [line.strip()              for line in raw_lines]
lowered  = [line.lower()              for line in stripped if line]
valid    = [line                      for line in lowered  if EMAIL_PAT.search(line)]
unique   = list(dict.fromkeys(valid))   # de-dupe preserving order

print("Cleaned emails:")
for e in unique:
    print(f"  {e}")

print(f"\n{len(raw_lines)} raw → {len(unique)} unique valid.")

# Domains-only — second comprehension over the same list
domains = [e.split("@")[1] for e in unique]
print(f"Domains: {sorted(set(domains))}")

Output

Cleaned emails:
  aisyah@example.com
  wei.jie@school.edu.my
  priya123@gmail.com
  iman.ahmad@school.edu.my

7 raw → 4 unique valid.
Domains: ['example.com', 'gmail.com', 'school.edu.my']

Read the diff

Four comprehensions in one pipeline — each does one job. Strip whitespace. Lower-case (and drop empties). Filter by regex. Project to domains. dict.fromkeys(valid) is a 3.7+ Pythonic way to dedupe while preserving order. Each line is short, focused, and reads as one operation.

Try It Yourself

13 min

01 🟢 Square the evens

From 1..20, build a list of the squares of just the even numbers.

Hint

squares = [n * n for n in range(1, 21) if n % 2 == 0]
print(squares)

02 🟡 Pass/Fail labels

Given [82, 47, 95, 60, 38, 50], produce ["pass", "fail", "pass", "pass", "fail", "pass"].

Hint

scores = [82, 47, 95, 60, 38, 50]
labels = ["pass" if s >= 50 else "fail" for s in scores]
print(labels)

Inline if/else in the value position (not the filter at the end). Transforms each item; drops nothing.

03 🔴 Pythagorean triples (stretch)

Find all Pythagorean triples (a, b, c) where a + b + c ≤ 30, using a triple-nested comprehension. The condition is a² + b² == c².

Hint

triples = [(a, b, c)
           for c in range(1, 30)
           for b in range(1, c)
           for a in range(1, b)
           if a * a + b * b == c * c and a + b + c <= 30]
print(triples)
# → [(3, 4, 5), (6, 8, 10), (5, 12, 13)]

The bounded ordering (a < b < c) avoids duplicates. The filter is at the bottom — applies after all three loops have picked values.

Mini-Challenge · The Grid

8 min

Build a 5×5 grid of (row, col) tuples using a comprehension. Then flatten a 2-D list of lists using a nested comprehension. Then build a list of just the diagonal cells.

Show one possible solution

# 1 — build a 5x5 grid
grid = [(r, c) for r in range(5) for c in range(5)]
print(len(grid))                # 25

# 2 — flatten a 2-D list
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [x for row in matrix for x in row]
print(flat)                      # [1, 2, 3, 4, 5, 6, 7, 8, 9]

# 3 — diagonal cells only
diag = [matrix[i][i] for i in range(len(matrix))]
print(diag)                      # [1, 5, 9]

Non-negotiables: the 5×5 grid uses two for-clauses; flatten uses outer/inner naming clearly; the diagonal uses the index trick. The flatten shape is something you'll write again and again — burn it in.

Recap

3 min

Four comprehension shapes — transform, filter, conditional-expression-in-value, nested. The if at the end filters; the if/else in the value transforms. Nested for-clauses give Cartesian products and flattening. Generator expressions (round brackets) are the lazy cousin — same syntax, no intermediate list. Don't force a comprehension when a regular loop would be clearer.

Vocabulary Card

filter clause: The if cond at the end. Keep only matching items.
conditional expression: x if cond else y in the value position. Transforms each item; doesn't drop any.
nested for-clauses: Two or more for clauses. Outer-first.
generator expression: Same syntax with (...). Yields one item at a time. Memory-efficient for big sequences.

Homework

4 min

Build cleanup.py. Given:

raw = ["  Aisyah ", "WEI JIE", "", "  ", "priya", "IMAN", "", "aizat "]

Build, all using comprehensions:

cleaned — strip and lower-case each, drop empties.
capitalised — title-case each name.
long_names — only those of length ≥ 5.
upper_or_pad — UPPER for short names (<5), normal for the rest.

Sample · cleanup.py

raw = ["  Aisyah ", "WEI JIE", "", "  ", "priya", "IMAN", "", "aizat "]

cleaned     = [name.strip().lower() for name in raw if name.strip()]
capitalised = [n.title() for n in cleaned]
long_names  = [n for n in capitalised if len(n) >= 5]
upper_or_pad= [n.upper() if len(n) < 5 else n for n in capitalised]

print("Cleaned    :", cleaned)
print("Capitalised:", capitalised)
print("Long >= 5  :", long_names)
print("Mixed      :", upper_or_pad)

Non-negotiables: each transformation is one comprehension, the filter at the end drops empties, and the conditional expression in #4 transforms without dropping.

AVOID a comprehension when: - the logic doesn't fit one line - you need print() or another side effect - the filter has multiple complex conditions - a colleague would have to slow down to read it In those cases, just use a for-loop.

# Compare list_total = sum([n * n for n in range(1, 1_000_000)]) # builds a million-item list, then sums gen_total = sum(n * n for n in range(1, 1_000_000)) # streams the squares, never stores them # Same answer. Generator is way more memory-efficient.

# email_clean.py — multiple comprehensions in a pipeline raw_lines = """ aisyah@example.com WEI.JIE@school.edu.my not_an_email priya123@gmail.com @bad.address iman.ahmad@school.edu.my PRIYA123@gmail.com """.strip().splitlines() import re EMAIL_PAT = re.compile(r"^[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}$") # Pipeline as comprehensions stripped = [line.strip() for line in raw_lines] lowered = [line.lower() for line in stripped if line] valid = [line for line in lowered if EMAIL_PAT.search(line)] unique = list(dict.fromkeys(valid)) # de-dupe preserving order print("Cleaned emails:") for e in unique: print(f" {e}") print(f"\n{len(raw_lines)} raw → {len(unique)} unique valid.") # Domains-only — second comprehension over the same list domains = [e.split("@")[1] for e in unique] print(f"Domains: {sorted(set(domains))}")

triples = [(a, b, c) for c in range(1, 30) for b in range(1, c) for a in range(1, b) if a * a + b * b == c * c and a + b + c <= 30] print(triples) # → [(3, 4, 5), (6, 8, 10), (5, 12, 13)]

# 1 — build a 5x5 grid grid = [(r, c) for r in range(5) for c in range(5)] print(len(grid)) # 25 # 2 — flatten a 2-D list matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] flat = [x for row in matrix for x in row] print(flat) # [1, 2, 3, 4, 5, 6, 7, 8, 9] # 3 — diagonal cells only diag = [matrix[i][i] for i in range(len(matrix))] print(diag) # [1, 5, 9]