PY-L2-12 · Nested Worlds — Dicts of Lists

Nested Worlds — Dicts of Lists

Yesterday: many records, one shape. Today: one mapping, many groups. A dict-of-lists is what you reach for when each item belongs to a category — drinks by stall, songs by genre, students by year group. Group, append, list, count.

⏱ 1 hour🗂 Concept lesson📚 After PY-L2-11💻 VS Code or online-python.com

Learning Goals

3 min

By the end of this lesson you can:

Build a dict-of-lists by hand: {"fruit": ["apple", "banana"], "drink": ["milo", "kopi"]}.
Append a new item to one group with d[key].append(item).
Group an unsorted list of pairs into a dict-of-lists with a tiny loop and the if-key-in pattern.
Use .items() from PY-L2-06 to loop both the group name and its list at once.

Warm-Up

5 min

Pak Cik Razif's menu has three stalls. Each stall has its own short list of items.

stalls = {
    "drinks":  ["milo", "kopi", "teh tarik"],
    "rice":    ["nasi lemak", "nasi goreng"],
    "noodles": ["char kway teow", "mee goreng"],
}

print(stalls["drinks"])
print(stalls["noodles"][1])
print(len(stalls))

Show the answer

['milo', 'kopi', 'teh tarik']
mee goreng
3

Two indexings, this time the other way round from yesterday. First a key to pick the group, then a [i] to pick the item inside that group.

Today's big idea

A dict-of-lists models "buckets". The key names the bucket; the value holds everything in it. Adding an item means append-ing to the right list. Listing a bucket is just d[key].

New Concept · Buckets That Grow

14 min

Compare the two shapes

Yesterday — list of dicts          Today — dict of lists

[                                  {
  {"item": "milo",  "stall": "D"},   "D": ["milo", "kopi", "teh"],
  {"item": "kopi",  "stall": "D"},   "R": ["nasi lemak", "nasi g."],
  {"item": "nasi",  "stall": "R"},   "N": ["char kway teow", "mee g."],
  ...                              }
]

Each row stands alone.             Each key holds many items.
Good when you need flexible        Good when you need every item
filters per row.                   of a category at once.

Same six menu items, two different shapes. The right shape depends on the question you ask the data.

Adding to a group

Append to the right list — same .append() from Level 1.

stalls["drinks"].append("horlicks")
print(stalls["drinks"])
# → ['milo', 'kopi', 'teh tarik', 'horlicks']

What if the group doesn't exist yet? Trying stalls["dessert"].append("cendol") would crash — there is no "dessert" key, so stalls["dessert"] blows up before .append can run.

The standard fix is the if-key-in pattern:

item  = "cendol"
group = "dessert"

if group not in stalls:
    stalls[group] = []      # create the empty list first
stalls[group].append(item)
print(stalls[group])
# → ['cendol']

Two lines: make the list if it isn't there, then append. We'll meet a one-line version of this trick (defaultdict) in Level 3.

Looping a dict-of-lists

Loop with .items() from PY-L2-06 — you get the group name and its list together.

for stall, items in stalls.items():
    print(stall, "(", len(items), "items )")
    for item in items:
        print("  -", item)

Two loops — the outer one over groups, the inner over items in each group. Indentation tells the eye that the inner is "inside" the outer.

Quick stats per group

The len() of each group's list is a count. The dict itself's len() is the number of groups.

print("Stalls:", len(stalls))
for stall, items in stalls.items():
    print(stall, ":", len(items), "items")

Grouping a flat list

Often you start with a flat list of (item, category) pairs and you want to bucket them yourself. This is the "group by" pattern — the bread and butter of every spreadsheet pivot table.

raw = [
    ("milo",       "drinks"),
    ("kopi",       "drinks"),
    ("nasi lemak", "rice"),
    ("teh tarik",  "drinks"),
    ("char kway teow", "noodles"),
    ("nasi goreng",    "rice"),
]

grouped = {}
for item, category in raw:
    if category not in grouped:
        grouped[category] = []
    grouped[category].append(item)

print(grouped)

{'drinks': ['milo', 'kopi', 'teh tarik'], 'rice': ['nasi lemak', 'nasi goreng'], 'noodles': ['char kway teow']}

Three lines do the "group by": the if-key-in check, the empty-list creation, and the append. The shape on the right is exactly the dict-of-lists you'd have built by hand.

This pattern shows up everywhere

Pivot tables. Database GROUP BY. Spotify's "songs by artist". Instagram's "posts by date". Once your fingers know the three-line group-by, you can re-shape any flat list into buckets.

Worked Example · The Class Library

12 min

The story

You've been given a flat list of seven library books, each as a (title, genre) tuple. The librarian wants to group them by genre, count each genre, and answer two follow-up questions.

Save as library_groups.py:

Code

# library_groups.py — group by genre, then report

raw = [
    ("Harry Potter",     "fantasy"),
    ("Percy Jackson",    "fantasy"),
    ("Wings of Fire",    "fantasy"),
    ("Diary of a Wimpy Kid", "humour"),
    ("Tom Gates",        "humour"),
    ("Charlotte's Web",  "classic"),
    ("The Hobbit",       "fantasy"),
]

# Step 1 — group by genre
by_genre = {}
for title, genre in raw:
    if genre not in by_genre:
        by_genre[genre] = []
    by_genre[genre].append(title)

# Step 2 — print every group
for genre, titles in by_genre.items():
    print(genre, "(", len(titles), "titles ):")
    for t in titles:
        print(" -", t)
    print()

# Step 3 — biggest genre
biggest_genre = ""
biggest_count = 0
for genre, titles in by_genre.items():
    if len(titles) > biggest_count:
        biggest_count = len(titles)
        biggest_genre = genre
print("Biggest genre:", biggest_genre, "(", biggest_count, "titles )")

# Step 4 — add a new book
title, genre = "Matilda", "classic"
if genre not in by_genre:
    by_genre[genre] = []
by_genre[genre].append(title)
print("After Matilda:", by_genre)

Output

fantasy ( 4 titles ):
 - Harry Potter
 - Percy Jackson
 - Wings of Fire
 - The Hobbit

humour ( 2 titles ):
 - Diary of a Wimpy Kid
 - Tom Gates

classic ( 1 titles ):
 - Charlotte's Web

Biggest genre: fantasy ( 4 titles )
After Matilda: {'fantasy': ['Harry Potter', 'Percy Jackson', 'Wings of Fire', 'The Hobbit'], 'humour': ['Diary of a Wimpy Kid', 'Tom Gates'], 'classic': ["Charlotte's Web", 'Matilda']}

Read the diff

Step 1 is the group-by recipe — three lines. Step 2 is the nested loop. Step 3 is yesterday's "biggest" scan, but applied to len(list) not a single number. Step 4 is the add-to-a-group recipe — the same three lines as step 1, but for one new item.

Why the right shape matters

If you'd kept the flat list of seven, "biggest genre" would have required two passes — one to count, one to compare. The dict-of-lists answers it in one scan because each genre's length is already there.

Try It Yourself

13 min

01 🟢 Songs by genre

Start with this dict-of-lists. Print one section per genre, each with a header line and the songs underneath.

songs = {
    "pop":   ["Bad Guy", "Levitating"],
    "k-pop": ["Dynamite", "Spring Day", "Cupid"],
    "rock":  ["Bohemian Rhapsody"],
}

Hint

for genre, titles in songs.items():
    print(genre.upper())
    for t in titles:
        print(" *", t)
    print()

02 🟡 Add a song

Add "Numb" to "rock" and "Butter" to "k-pop". Print the new songs dict and how many songs there are in total.

Hint

songs["rock"].append("Numb")
songs["k-pop"].append("Butter")
print(songs)

total = 0
for titles in songs.values():
    total += len(titles)
print("Total songs:", total)

Notice the inner for titles in songs.values() — we only need the lists, so we use .values() directly. len(titles) counts the songs in that one genre; we sum across genres.

03 🔴 Group from scratch (stretch)

You're given the flat list below. Build by_subject, a dict mapping each subject to a list of student names. Print it.

enrolments = [
    ("Aisyah",  "maths"),
    ("Wei Jie", "music"),
    ("Priya",   "maths"),
    ("Iman",    "art"),
    ("Aizat",   "music"),
    ("Hafiz",   "maths"),
]

Hint

by_subject = {}
for name, subject in enrolments:
    if subject not in by_subject:
        by_subject[subject] = []
    by_subject[subject].append(name)

print(by_subject)
# → {'maths': ['Aisyah', 'Priya', 'Hafiz'],
#    'music': ['Wei Jie', 'Aizat'],
#    'art':   ['Iman']}

This is the group-by recipe again. Get it into your fingers — you'll use it for the rest of Level 2.

Mini-Challenge · The Class Time-Table

8 min

Build timetable.py. You've got a flat list of (day, subject) tuples for the week. Group them by day, then answer four questions.

lessons = [
    ("Mon", "Maths"),    ("Mon", "Art"),     ("Mon", "Science"),
    ("Tue", "Maths"),    ("Tue", "PE"),      ("Tue", "English"),  ("Tue", "Music"),
    ("Wed", "Science"),  ("Wed", "Maths"),   ("Wed", "BM"),
    ("Thu", "English"),  ("Thu", "Maths"),
    ("Fri", "PE"),       ("Fri", "Art"),     ("Fri", "Maths"),    ("Fri", "BM"),
]

Your file must:

Group the lessons into by_day — a dict mapping the day to a list of subjects.
Loop the dict and print one block per day: Mon (3): Maths · Art · Science.
Print the day with the most lessons.
Print the day with the fewest lessons.
Print every day that has a Maths lesson (use if "Maths" in subjects).

Stretch goal. Also build by_subject from the same flat list — "which days does Maths appear?" — and print it.

Show one possible solution

# timetable.py — group lessons by day, then by subject

lessons = [
    ("Mon", "Maths"),    ("Mon", "Art"),     ("Mon", "Science"),
    ("Tue", "Maths"),    ("Tue", "PE"),      ("Tue", "English"),  ("Tue", "Music"),
    ("Wed", "Science"),  ("Wed", "Maths"),   ("Wed", "BM"),
    ("Thu", "English"),  ("Thu", "Maths"),
    ("Fri", "PE"),       ("Fri", "Art"),     ("Fri", "Maths"),    ("Fri", "BM"),
]

by_day = {}
for day, subj in lessons:
    if day not in by_day:
        by_day[day] = []
    by_day[day].append(subj)

# Print
for day, subjects in by_day.items():
    print(day, "(", len(subjects), "):", " · ".join(subjects))

# Most lessons
busiest = ""
busy_n  = 0
for day, subjects in by_day.items():
    if len(subjects) > busy_n:
        busy_n  = len(subjects)
        busiest = day
print("Busiest day:", busiest, "(", busy_n, ")")

# Fewest lessons
quietest = next(iter(by_day))   # any starting day
quiet_n  = len(by_day[quietest])
for day, subjects in by_day.items():
    if len(subjects) < quiet_n:
        quiet_n  = len(subjects)
        quietest = day
print("Quietest day:", quietest, "(", quiet_n, ")")

# Days that have Maths
print("Days with Maths:")
for day, subjects in by_day.items():
    if "Maths" in subjects:
        print(" -", day)

# Stretch — by subject
by_subject = {}
for day, subj in lessons:
    if subj not in by_subject:
        by_subject[subj] = []
    by_subject[subj].append(day)
print()
print("by_subject:", by_subject)

Non-negotiables: the by_day group-by recipe, one nested-loop print of the timetable, busiest/quietest scans, and the in-check filter for Maths. The Stretch shows that the same flat list can be regrouped along a different axis — that's pivoting.

Recap

3 min

A dict-of-lists models buckets: one key per category, one list per bucket. Add an item by appending to the right list — checking first that the list exists. Loop with .items() to walk both the group name and the items together. The three-line "group by" recipe (check, create empty list, append) turns any flat list into buckets — and that's the same pattern every database's GROUP BY does under the hood.

Vocabulary Card

dict of lists: A dict whose values are all lists. Each key names a bucket; each list holds everything in that bucket.
group-by recipe: Walk a flat list, for each item ensure its category exists in the dict (create an empty list if not), then append.
if key not in d: The check that protects you from KeyError when adding a brand-new bucket.

Homework

4 min

Build spotify_lite.py. You start with a flat list of (song, artist) tuples — at least ten — your choice of music.

Your file must:

Group the songs into by_artist using the three-line recipe.
Print one block per artist with the song titles underneath.
Print the total number of artists and the total number of songs.
Print the artist with the most songs in the list.
Ask the user for an artist name with input() and print their songs — or Not in playlist. if the artist isn't there. Use .get(name, ...).

Stretch. Print every artist who has more than one song in the playlist — filter the dict with if len(songs) > 1.

Sample · spotify_lite.py

# spotify_lite.py — group songs by artist

playlist = [
    ("Levitating",      "Dua Lipa"),
    ("New Rules",       "Dua Lipa"),
    ("Bad Guy",         "Billie Eilish"),
    ("Happier Than Ever","Billie Eilish"),
    ("Lovely",          "Billie Eilish"),
    ("Dynamite",        "BTS"),
    ("Butter",          "BTS"),
    ("Bohemian Rhapsody","Queen"),
    ("Don't Stop Me Now","Queen"),
    ("Cupid",           "FIFTY FIFTY"),
]

by_artist = {}
for song, artist in playlist:
    if artist not in by_artist:
        by_artist[artist] = []
    by_artist[artist].append(song)

# Print every artist's block
for artist, songs in by_artist.items():
    print(artist, "(", len(songs), "songs ):")
    for s in songs:
        print(" *", s)
    print()

print("Total artists:", len(by_artist))
print("Total songs  :", len(playlist))

# Most prolific artist
top_artist = ""
top_count  = 0
for artist, songs in by_artist.items():
    if len(songs) > top_count:
        top_count  = len(songs)
        top_artist = artist
print("Top artist   :", top_artist, "(", top_count, ")")

# User lookup
who = input("Look up artist: ")
result = by_artist.get(who, "Not in playlist.")
print(result)

# Stretch — more than one song
print()
print("Artists with multiple songs:")
for artist, songs in by_artist.items():
    if len(songs) > 1:
        print(" -", artist, "(", len(songs), ")")

Non-negotiables: the three-line group-by, the nested-loop print, two totals, a top-artist scan, and one .get(who, "Not in playlist."). Your choice of music is yours.

Nested Worlds — Dicts of Lists

⏱ 1 hour🗂 Concept lesson📚 After PY-L2-11💻 VS Code or online-python.com

stalls = { "drinks": ["milo", "kopi", "teh tarik"], "rice": ["nasi lemak", "nasi goreng"], "noodles": ["char kway teow", "mee goreng"], } print(stalls["drinks"]) print(stalls["noodles"][1]) print(len(stalls))

Yesterday — list of dicts Today — dict of lists [ { {"item": "milo", "stall": "D"}, "D": ["milo", "kopi", "teh"], {"item": "kopi", "stall": "D"}, "R": ["nasi lemak", "nasi g."], {"item": "nasi", "stall": "R"}, "N": ["char kway teow", "mee g."], ... } ] Each row stands alone. Each key holds many items. Good when you need flexible Good when you need every item filters per row. of a category at once.

raw = [ ("milo", "drinks"), ("kopi", "drinks"), ("nasi lemak", "rice"), ("teh tarik", "drinks"), ("char kway teow", "noodles"), ("nasi goreng", "rice"), ] grouped = {} for item, category in raw: if category not in grouped: grouped[category] = [] grouped[category].append(item) print(grouped)

# library_groups.py — group by genre, then report raw = [ ("Harry Potter", "fantasy"), ("Percy Jackson", "fantasy"), ("Wings of Fire", "fantasy"), ("Diary of a Wimpy Kid", "humour"), ("Tom Gates", "humour"), ("Charlotte's Web", "classic"), ("The Hobbit", "fantasy"), ] # Step 1 — group by genre by_genre = {} for title, genre in raw: if genre not in by_genre: by_genre[genre] = [] by_genre[genre].append(title) # Step 2 — print every group for genre, titles in by_genre.items(): print(genre, "(", len(titles), "titles ):") for t in titles: print(" -", t) print() # Step 3 — biggest genre biggest_genre = "" biggest_count = 0 for genre, titles in by_genre.items(): if len(titles) > biggest_count: biggest_count = len(titles) biggest_genre = genre print("Biggest genre:", biggest_genre, "(", biggest_count, "titles )") # Step 4 — add a new book title, genre = "Matilda", "classic" if genre not in by_genre: by_genre[genre] = [] by_genre[genre].append(title) print("After Matilda:", by_genre)

fantasy ( 4 titles ): - Harry Potter - Percy Jackson - Wings of Fire - The Hobbit humour ( 2 titles ): - Diary of a Wimpy Kid - Tom Gates classic ( 1 titles ): - Charlotte's Web Biggest genre: fantasy ( 4 titles ) After Matilda: {'fantasy': ['Harry Potter', 'Percy Jackson', 'Wings of Fire', 'The Hobbit'], 'humour': ['Diary of a Wimpy Kid', 'Tom Gates'], 'classic': ["Charlotte's Web", 'Matilda']}

by_subject = {} for name, subject in enrolments: if subject not in by_subject: by_subject[subject] = [] by_subject[subject].append(name) print(by_subject) # → {'maths': ['Aisyah', 'Priya', 'Hafiz'], # 'music': ['Wei Jie', 'Aizat'], # 'art': ['Iman']}

lessons = [ ("Mon", "Maths"), ("Mon", "Art"), ("Mon", "Science"), ("Tue", "Maths"), ("Tue", "PE"), ("Tue", "English"), ("Tue", "Music"), ("Wed", "Science"), ("Wed", "Maths"), ("Wed", "BM"), ("Thu", "English"), ("Thu", "Maths"), ("Fri", "PE"), ("Fri", "Art"), ("Fri", "Maths"), ("Fri", "BM"), ]

# timetable.py — group lessons by day, then by subject lessons = [ ("Mon", "Maths"), ("Mon", "Art"), ("Mon", "Science"), ("Tue", "Maths"), ("Tue", "PE"), ("Tue", "English"), ("Tue", "Music"), ("Wed", "Science"), ("Wed", "Maths"), ("Wed", "BM"), ("Thu", "English"), ("Thu", "Maths"), ("Fri", "PE"), ("Fri", "Art"), ("Fri", "Maths"), ("Fri", "BM"), ] by_day = {} for day, subj in lessons: if day not in by_day: by_day[day] = [] by_day[day].append(subj) # Print for day, subjects in by_day.items(): print(day, "(", len(subjects), "):", " · ".join(subjects)) # Most lessons busiest = "" busy_n = 0 for day, subjects in by_day.items(): if len(subjects) > busy_n: busy_n = len(subjects) busiest = day print("Busiest day:", busiest, "(", busy_n, ")") # Fewest lessons quietest = next(iter(by_day)) # any starting day quiet_n = len(by_day[quietest]) for day, subjects in by_day.items(): if len(subjects) < quiet_n: quiet_n = len(subjects) quietest = day print("Quietest day:", quietest, "(", quiet_n, ")") # Days that have Maths print("Days with Maths:") for day, subjects in by_day.items(): if "Maths" in subjects: print(" -", day) # Stretch — by subject by_subject = {} for day, subj in lessons: if subj not in by_subject: by_subject[subj] = [] by_subject[subj].append(day) print() print("by_subject:", by_subject)