Learning Goals
3 minBy the end of this lesson you can:
- Build a dict-of-lists by hand:
{"fruit": ["apple", "banana"], "drink": ["milo", "kopi"]}. - Append a new item to one group with
d[key].append(item). - Group an unsorted list of pairs into a dict-of-lists with a tiny loop and the
if-key-inpattern. - Use
.items()from PY-L2-06 to loop both the group name and its list at once.
Warm-Up
5 minPak Cik Razif's menu has three stalls. Each stall has its own short list of items.
stalls = { "drinks": ["milo", "kopi", "teh tarik"], "rice": ["nasi lemak", "nasi goreng"], "noodles": ["char kway teow", "mee goreng"], } print(stalls["drinks"]) print(stalls["noodles"][1]) print(len(stalls))
Show the answer
['milo', 'kopi', 'teh tarik'] mee goreng 3
Two indexings, this time the other way round from yesterday. First a key to pick the group, then a [i] to pick the item inside that group.
A dict-of-lists models "buckets". The key names the bucket; the value holds everything in it. Adding an item means append-ing to the right list. Listing a bucket is just d[key].
New Concept · Buckets That Grow
14 minCompare the two shapes
Yesterday — list of dicts Today — dict of lists
[ {
{"item": "milo", "stall": "D"}, "D": ["milo", "kopi", "teh"],
{"item": "kopi", "stall": "D"}, "R": ["nasi lemak", "nasi g."],
{"item": "nasi", "stall": "R"}, "N": ["char kway teow", "mee g."],
... }
]
Each row stands alone. Each key holds many items.
Good when you need flexible Good when you need every item
filters per row. of a category at once.Same six menu items, two different shapes. The right shape depends on the question you ask the data.
Adding to a group
Append to the right list — same .append() from Level 1.
stalls["drinks"].append("horlicks") print(stalls["drinks"]) # → ['milo', 'kopi', 'teh tarik', 'horlicks']
What if the group doesn't exist yet? Trying stalls["dessert"].append("cendol") would crash — there is no "dessert" key, so stalls["dessert"] blows up before .append can run.
The standard fix is the if-key-in pattern:
item = "cendol" group = "dessert" if group not in stalls: stalls[group] = [] # create the empty list first stalls[group].append(item) print(stalls[group]) # → ['cendol']
Two lines: make the list if it isn't there, then append. We'll meet a one-line version of this trick (defaultdict) in Level 3.
Looping a dict-of-lists
Loop with .items() from PY-L2-06 — you get the group name and its list together.
for stall, items in stalls.items(): print(stall, "(", len(items), "items )") for item in items: print(" -", item)
Two loops — the outer one over groups, the inner over items in each group. Indentation tells the eye that the inner is "inside" the outer.
Quick stats per group
The len() of each group's list is a count. The dict itself's len() is the number of groups.
print("Stalls:", len(stalls)) for stall, items in stalls.items(): print(stall, ":", len(items), "items")
Grouping a flat list
Often you start with a flat list of (item, category) pairs and you want to bucket them yourself. This is the "group by" pattern — the bread and butter of every spreadsheet pivot table.
raw = [ ("milo", "drinks"), ("kopi", "drinks"), ("nasi lemak", "rice"), ("teh tarik", "drinks"), ("char kway teow", "noodles"), ("nasi goreng", "rice"), ] grouped = {} for item, category in raw: if category not in grouped: grouped[category] = [] grouped[category].append(item) print(grouped)
{'drinks': ['milo', 'kopi', 'teh tarik'], 'rice': ['nasi lemak', 'nasi goreng'], 'noodles': ['char kway teow']}Three lines do the "group by": the if-key-in check, the empty-list creation, and the append. The shape on the right is exactly the dict-of-lists you'd have built by hand.
Pivot tables. Database GROUP BY. Spotify's "songs by artist". Instagram's "posts by date". Once your fingers know the three-line group-by, you can re-shape any flat list into buckets.
Worked Example · The Class Library
12 minThe story
You've been given a flat list of seven library books, each as a (title, genre) tuple. The librarian wants to group them by genre, count each genre, and answer two follow-up questions.
Save as library_groups.py:
Code
# library_groups.py — group by genre, then report raw = [ ("Harry Potter", "fantasy"), ("Percy Jackson", "fantasy"), ("Wings of Fire", "fantasy"), ("Diary of a Wimpy Kid", "humour"), ("Tom Gates", "humour"), ("Charlotte's Web", "classic"), ("The Hobbit", "fantasy"), ] # Step 1 — group by genre by_genre = {} for title, genre in raw: if genre not in by_genre: by_genre[genre] = [] by_genre[genre].append(title) # Step 2 — print every group for genre, titles in by_genre.items(): print(genre, "(", len(titles), "titles ):") for t in titles: print(" -", t) print() # Step 3 — biggest genre biggest_genre = "" biggest_count = 0 for genre, titles in by_genre.items(): if len(titles) > biggest_count: biggest_count = len(titles) biggest_genre = genre print("Biggest genre:", biggest_genre, "(", biggest_count, "titles )") # Step 4 — add a new book title, genre = "Matilda", "classic" if genre not in by_genre: by_genre[genre] = [] by_genre[genre].append(title) print("After Matilda:", by_genre)
Output
fantasy ( 4 titles ):
- Harry Potter
- Percy Jackson
- Wings of Fire
- The Hobbit
humour ( 2 titles ):
- Diary of a Wimpy Kid
- Tom Gates
classic ( 1 titles ):
- Charlotte's Web
Biggest genre: fantasy ( 4 titles )
After Matilda: {'fantasy': ['Harry Potter', 'Percy Jackson', 'Wings of Fire', 'The Hobbit'], 'humour': ['Diary of a Wimpy Kid', 'Tom Gates'], 'classic': ["Charlotte's Web", 'Matilda']}Read the diff
Step 1 is the group-by recipe — three lines. Step 2 is the nested loop. Step 3 is yesterday's "biggest" scan, but applied to len(list) not a single number. Step 4 is the add-to-a-group recipe — the same three lines as step 1, but for one new item.
If you'd kept the flat list of seven, "biggest genre" would have required two passes — one to count, one to compare. The dict-of-lists answers it in one scan because each genre's length is already there.
Try It Yourself
13 minStart with this dict-of-lists. Print one section per genre, each with a header line and the songs underneath.
songs = { "pop": ["Bad Guy", "Levitating"], "k-pop": ["Dynamite", "Spring Day", "Cupid"], "rock": ["Bohemian Rhapsody"], }
Hint
for genre, titles in songs.items(): print(genre.upper()) for t in titles: print(" *", t) print()
Add "Numb" to "rock" and "Butter" to "k-pop". Print the new songs dict and how many songs there are in total.
Hint
songs["rock"].append("Numb") songs["k-pop"].append("Butter") print(songs) total = 0 for titles in songs.values(): total += len(titles) print("Total songs:", total)
Notice the inner for titles in songs.values() — we only need the lists, so we use .values() directly. len(titles) counts the songs in that one genre; we sum across genres.
You're given the flat list below. Build by_subject, a dict mapping each subject to a list of student names. Print it.
enrolments = [ ("Aisyah", "maths"), ("Wei Jie", "music"), ("Priya", "maths"), ("Iman", "art"), ("Aizat", "music"), ("Hafiz", "maths"), ]
Hint
by_subject = {} for name, subject in enrolments: if subject not in by_subject: by_subject[subject] = [] by_subject[subject].append(name) print(by_subject) # → {'maths': ['Aisyah', 'Priya', 'Hafiz'], # 'music': ['Wei Jie', 'Aizat'], # 'art': ['Iman']}
This is the group-by recipe again. Get it into your fingers — you'll use it for the rest of Level 2.
Mini-Challenge · The Class Time-Table
8 minBuild timetable.py. You've got a flat list of (day, subject) tuples for the week. Group them by day, then answer four questions.
lessons = [ ("Mon", "Maths"), ("Mon", "Art"), ("Mon", "Science"), ("Tue", "Maths"), ("Tue", "PE"), ("Tue", "English"), ("Tue", "Music"), ("Wed", "Science"), ("Wed", "Maths"), ("Wed", "BM"), ("Thu", "English"), ("Thu", "Maths"), ("Fri", "PE"), ("Fri", "Art"), ("Fri", "Maths"), ("Fri", "BM"), ]
Your file must:
- Group the lessons into
by_day— a dict mapping the day to a list of subjects. - Loop the dict and print one block per day:
Mon (3): Maths · Art · Science. - Print the day with the most lessons.
- Print the day with the fewest lessons.
- Print every day that has a Maths lesson (use
if "Maths" in subjects).
Stretch goal. Also build by_subject from the same flat list — "which days does Maths appear?" — and print it.
Show one possible solution
# timetable.py — group lessons by day, then by subject lessons = [ ("Mon", "Maths"), ("Mon", "Art"), ("Mon", "Science"), ("Tue", "Maths"), ("Tue", "PE"), ("Tue", "English"), ("Tue", "Music"), ("Wed", "Science"), ("Wed", "Maths"), ("Wed", "BM"), ("Thu", "English"), ("Thu", "Maths"), ("Fri", "PE"), ("Fri", "Art"), ("Fri", "Maths"), ("Fri", "BM"), ] by_day = {} for day, subj in lessons: if day not in by_day: by_day[day] = [] by_day[day].append(subj) # Print for day, subjects in by_day.items(): print(day, "(", len(subjects), "):", " · ".join(subjects)) # Most lessons busiest = "" busy_n = 0 for day, subjects in by_day.items(): if len(subjects) > busy_n: busy_n = len(subjects) busiest = day print("Busiest day:", busiest, "(", busy_n, ")") # Fewest lessons quietest = next(iter(by_day)) # any starting day quiet_n = len(by_day[quietest]) for day, subjects in by_day.items(): if len(subjects) < quiet_n: quiet_n = len(subjects) quietest = day print("Quietest day:", quietest, "(", quiet_n, ")") # Days that have Maths print("Days with Maths:") for day, subjects in by_day.items(): if "Maths" in subjects: print(" -", day) # Stretch — by subject by_subject = {} for day, subj in lessons: if subj not in by_subject: by_subject[subj] = [] by_subject[subj].append(day) print() print("by_subject:", by_subject)
Non-negotiables: the by_day group-by recipe, one nested-loop print of the timetable, busiest/quietest scans, and the in-check filter for Maths. The Stretch shows that the same flat list can be regrouped along a different axis — that's pivoting.
Recap
3 minA dict-of-lists models buckets: one key per category, one list per bucket. Add an item by appending to the right list — checking first that the list exists. Loop with .items() to walk both the group name and the items together. The three-line "group by" recipe (check, create empty list, append) turns any flat list into buckets — and that's the same pattern every database's GROUP BY does under the hood.
Vocabulary Card
- dict of lists
- A dict whose values are all lists. Each key names a bucket; each list holds everything in that bucket.
- group-by recipe
- Walk a flat list, for each item ensure its category exists in the dict (create an empty list if not), then
append. - if key not in d
- The check that protects you from
KeyErrorwhen adding a brand-new bucket.
Homework
4 minBuild spotify_lite.py. You start with a flat list of (song, artist) tuples — at least ten — your choice of music.
Your file must:
- Group the songs into
by_artistusing the three-line recipe. - Print one block per artist with the song titles underneath.
- Print the total number of artists and the total number of songs.
- Print the artist with the most songs in the list.
- Ask the user for an artist name with
input()and print their songs — orNot in playlist.if the artist isn't there. Use.get(name, ...).
Stretch. Print every artist who has more than one song in the playlist — filter the dict with if len(songs) > 1.
Sample · spotify_lite.py
# spotify_lite.py — group songs by artist playlist = [ ("Levitating", "Dua Lipa"), ("New Rules", "Dua Lipa"), ("Bad Guy", "Billie Eilish"), ("Happier Than Ever","Billie Eilish"), ("Lovely", "Billie Eilish"), ("Dynamite", "BTS"), ("Butter", "BTS"), ("Bohemian Rhapsody","Queen"), ("Don't Stop Me Now","Queen"), ("Cupid", "FIFTY FIFTY"), ] by_artist = {} for song, artist in playlist: if artist not in by_artist: by_artist[artist] = [] by_artist[artist].append(song) # Print every artist's block for artist, songs in by_artist.items(): print(artist, "(", len(songs), "songs ):") for s in songs: print(" *", s) print() print("Total artists:", len(by_artist)) print("Total songs :", len(playlist)) # Most prolific artist top_artist = "" top_count = 0 for artist, songs in by_artist.items(): if len(songs) > top_count: top_count = len(songs) top_artist = artist print("Top artist :", top_artist, "(", top_count, ")") # User lookup who = input("Look up artist: ") result = by_artist.get(who, "Not in playlist.") print(result) # Stretch — more than one song print() print("Artists with multiple songs:") for artist, songs in by_artist.items(): if len(songs) > 1: print(" -", artist, "(", len(songs), ")")
Non-negotiables: the three-line group-by, the nested-loop print, two totals, a top-artist scan, and one .get(who, "Not in playlist."). Your choice of music is yours.