Learning Goals
3 minBy the end of this lesson you can:
- Build a set with curly braces, and remember the empty-set trap:
set(), not{}. - Add and remove items with
.add(),.remove()and.discard(). - Strip duplicates from a list in one line —
list(set(my_list)). - Check membership with
infaster than a list lets you.
Warm-Up
5 minYou ran a poll. Forty students named their favourite hawker dish. The raw responses look like this:
responses = ["nasi lemak", "satay", "nasi lemak", "roti canai", "satay", "nasi lemak", "cendol", "satay", "milo"]
How would you answer the question "how many different dishes were mentioned"? You could loop and track seen items, or…
unique = set(responses) print(unique) print(len(unique))
Show the answer
{'cendol', 'milo', 'nasi lemak', 'roti canai', 'satay'}
5Five lines became one. The duplicates vanished — and the question was answered in a single len() call.
A set is a collection that refuses duplicates. Anything you put in twice still appears once. That single rule turns dozens of common tasks into one-liners.
New Concept · The Set
14 minBuilding one
Use curly braces with comma-separated items — looks a bit like a dict but with no :.
vowels = {"a", "e", "i", "o", "u"} primes = {2, 3, 5, 7, 11}
You can also build a set from any iterable using set(...):
from_list = set(["a", "b", "a", "c"]) from_str = set("hello") print(from_list) # → {'a', 'b', 'c'} print(from_str) # → {'h', 'e', 'l', 'o'} (note: only ONE 'l')
An empty pair of curly braces {} is an empty dictionary, not a set. To make an empty set, use set():
maybe = {} # ← this is a dict! print(type(maybe)) # → <class 'dict'> definitely = set() # ← this is an empty set print(type(definitely)) # → <class 'set'>
Sets refuse duplicates
Add the same thing twice — the set silently ignores the second one. No error, no warning, no duplicate.
s = {"apple"} s.add("apple") s.add("banana") s.add("apple") print(s) # → {'apple', 'banana'} print(len(s)) # → 2
Adding and removing
.add(x) puts x in. .remove(x) takes it out (or crashes if x isn't there). .discard(x) takes x out if present, and silently does nothing otherwise — the safer choice when you're not sure.
flavours = {"milo", "kopi", "teh"} flavours.add("horlicks") # ← in flavours.remove("kopi") # ← out (must be there) flavours.discard("not on menu") # ← no error, just nothing print(flavours) # → {'milo', 'teh', 'horlicks'}
Membership — set's superpower
x in some_set is the same syntax as with a list — but on a set it's much faster. Lists scan one by one. Sets jump straight to the answer.
forbidden = {"banana", "durian", "tempoyak"} order = "durian" if order in forbidden: print("Sorry, no", order, "on the bus.") else: print("OK, you can bring", order, "on.")
The shape if x in collection: is one of the most common in Python — and it's a set's best move.
What sets don't have
Sets are unordered — there is no set[0], no .index(), no slicing. If you need order, use a list. If you need labels, use a dict. If you only ever ask "is this in?" or "what unique items?" — set.
Use a LIST when… order or duplicates matter Use a TUPLE when… the data is fixed Use a DICT when… you need to look up by label Use a SET when… duplicates are noise, membership is the question
The most common one-liner — strip duplicates
You'll write this loads of times in your life. Worth memorising:
raw = ["a", "b", "a", "c", "b", "a"] unique = list(set(raw)) print(unique) # → ['a', 'b', 'c'] (order may vary)
Note: turning a list into a set loses the original order. If you need to preserve order, you'll learn a different trick in PY-L3-21.
Worked Example · The RSVP List
12 minThe story
You're organising a class party. Three forms come back — sign-ups by name. Some students filled in twice. Some withdrew. You want to print:
- The unique guest list.
- How many real guests there are.
- Whether Aisyah is coming.
- Whether anyone called Iman is on the list.
Save as rsvp.py:
Code
# rsvp.py — clean the guest list with a set raw = ["Aisyah", "Wei Jie", "Priya", "Aisyah", "Iman", "Aizat", "Priya", "Wei Jie", "Iman", "Hafiz"] guests = set(raw) print("Guest list :", guests) print("How many :", len(guests)) # Step 3 if "Aisyah" in guests: print("Aisyah is coming.") else: print("No Aisyah.") # Step 4 if "Iman" in guests: print("An Iman is on the list.") # A withdrawal guests.discard("Hafiz") # uses discard in case he wasn't there print("After Hafiz cancels:", guests)
Output
Guest list : {'Wei Jie', 'Priya', 'Aizat', 'Iman', 'Aisyah', 'Hafiz'}
How many : 6
Aisyah is coming.
An Iman is on the list.
After Hafiz cancels: {'Wei Jie', 'Priya', 'Aizat', 'Iman', 'Aisyah'}What just happened
Ten raw sign-ups became six unique guests — the duplicates dropped out the moment we wrapped the list in set(...). From there every question became one short line — len(), in, discard(). The order of names in the printout isn't the order they signed up; sets don't track order, and you might see them printed in a different sequence on your machine.
.discard() and not .remove()?If Hafiz had already withdrawn before today's run, his name wouldn't be in the set — and .remove("Hafiz") would have crashed with KeyError. .discard() shrugs at missing items, which is exactly what we want when cancellations come in.
Try It Yourself
13 minThree quick exercises.
Take the word "programming". Use set() to find its unique letters, then print the count.
Hint
word = "programming" letters = set(word) print(letters) print("Unique letters:", len(letters)) # → 8
You're writing a forum. Some words are banned. Use a set banned = {"spam", "junk", "scam"} and ask the user for a word with input(). Print Allowed. or Blocked.
Hint
banned = {"spam", "junk", "scam"} word = input("Word: ").lower() if word in banned: print("Blocked.") else: print("Allowed.")
This is the classic "is this in the set?" question — exactly what sets exist for.
Given a list of attendance scans where some students were scanned twice, print the sorted list of unique attendees and the count.
scans = ["A123", "B456", "A123", "C789", "B456", "D012", "C789", "A123"]
Hint
scans = ["A123", "B456", "A123", "C789", "B456", "D012", "C789", "A123"] unique = sorted(set(scans)) print(unique) # → ['A123', 'B456', 'C789', 'D012'] print("Real attendance:", len(unique))
sorted() wraps any iterable — even a set — and gives you back an ordered list. Perfect for printing.
Mini-Challenge · The Pangram Detector
8 minA pangram is a sentence that uses every letter of the alphabet at least once. The classic English one is "the quick brown fox jumps over the lazy dog".
Build pangram.py — a tiny tool that tells you whether a sentence is a pangram, and if not, which letters are missing.
Your file must:
- Define
alphabet = set("abcdefghijklmnopqrstuvwxyz"). - Define a function
check(sentence)that returns either"PANGRAM"or a sorted list of missing letters. - Inside the function: lower-case the sentence, turn its letter characters into a set, then subtract from
alphabet. - Test it on at least two sentences — one pangram, one not.
Stretch goal. Loop a list of three sentences and report each one's status — like a tiny test runner.
Show one possible solution
# pangram.py — uses every letter of the alphabet? alphabet = set("abcdefghijklmnopqrstuvwxyz") def check(sentence): letters_in_sentence = set() for ch in sentence.lower(): if ch.isalpha(): letters_in_sentence.add(ch) missing = alphabet - letters_in_sentence if len(missing) == 0: return "PANGRAM" return sorted(missing) print(check("the quick brown fox jumps over the lazy dog")) # → PANGRAM print(check("hello world")) # → ['a', 'b', 'c', 'f', 'g', 'i', 'j', 'k', 'm', 'n', 'p', 'q', 's', 't', 'u', 'v', 'x', 'y', 'z'] # Stretch — run a test sheet tests = [ "Pack my box with five dozen liquor jugs.", "The five boxing wizards jump quickly.", "Cats and dogs.", ] for s in tests: print(s, "->", check(s))
Non-negotiables: alphabet as a set of letters, a function that adds only alphabetic characters to a sentence-set, and the magic line missing = alphabet - letters_in_sentence — set subtraction, the headline trick of tomorrow's lesson.
Recap
3 minA set stores unique items. Build it with curly braces (but use set() for an empty one), add with .add(), remove with .remove() or the gentler .discard(), and ask "is this here?" with in. Sets give you the one-line de-dupe trick list(set(lst)) and very fast membership checks. They have no order, no labels, and no indexing — pick a list, tuple or dict if you need those.
Vocabulary Card
- set
- An unordered collection of unique items.
- set()
- The way to make an empty set.
{}is an empty dict, not a set. - .add(x) / .discard(x) / .remove(x)
- Add to a set / safely remove if present / remove (crashes if absent).
- list(set(lst))
- The one-line "strip duplicates" idiom. Loses original order.
Homework
4 minSave a new file vocab_lab.py. Compare the unique words in two short pieces of text.
Your file must:
- Start with two long-ish strings — one paragraph each, your choice of topic.
- Split each into words using
.split(). - Build two sets —
words_aandwords_b. - Print the size of each.
- Print whether a specific word (your choice) appears in
words_ausingin. - Print whether the same word appears in
words_b.
Stretch. Find the words in words_a that are not in words_b using words_a - words_b — a sneak peek at PY-L2-09's set operations.
Sample · vocab_lab.py
# vocab_lab.py — compare word-sets of two paragraphs text_a = "the cat sat on the mat the dog barked the cat ran" text_b = "the dog sat by the door the cat slept" words_a = set(text_a.split()) words_b = set(text_b.split()) print("Unique in A:", len(words_a)) print("Unique in B:", len(words_b)) target = "dog" print(target, "in A?", target in words_a) print(target, "in B?", target in words_b) # Stretch — only in A only_in_a = words_a - words_b print("Words only in A:", sorted(only_in_a))
Non-negotiables: two set(text.split()) calls, two len() reports, and two in checks. Your two paragraphs can be about anything.