Learning Goals
3 minBy the end of this lesson you can:
- Cut a string into a list of pieces with
.split("sep"). - Glue a list of strings back into one string with
"sep".join(list). - Trim leading/trailing whitespace with
.strip(). - Locate a substring with
.find("needle")— and know the difference between "not found" and "found at position 0".
Warm-Up
5 minYou used .split(", ") back in PY-L2-04 to break apart an address. Today we own that method properly.
Predict the outputs:
line = "Aisyah, 12, Kuala Lumpur" parts = line.split(", ") print(parts) print(len(parts)) print(", ".join(parts))
Show the answer
['Aisyah', '12', 'Kuala Lumpur'] 3 Aisyah, 12, Kuala Lumpur
.split made a list of three. .join put them back together. The two methods are inverses of each other — break apart and weld back.
Real text arrives as one long string. To work with it you almost always have to chop it up. To save it back you almost always have to glue it together. .split and .join are how.
New Concept · The Four Methods
14 min1 · .split(sep) — chop into a list
Pass a separator. Get back a list of pieces — the separator itself is removed.
csv = "milo,kopi,teh tarik,horlicks" drinks = csv.split(",") print(drinks) # → ['milo', 'kopi', 'teh tarik', 'horlicks'] sentence = "the quick brown fox" words = sentence.split(" ") print(words) # → ['the', 'quick', 'brown', 'fox']
Call .split() with no argument and Python splits on any whitespace — runs of spaces, tabs, newlines — all in one go. That's often what you want:
messy = " hello world " print(messy.split()) # → ['hello', 'world'] (no empty strings!) print(messy.split(" ")) # → ['', '', 'hello', '', '', 'world', '', ''] ← oof
The no-arg form is more forgiving with messy whitespace. Reach for it first; only specify a separator when you genuinely need to split on something specific.
2 · "sep".join(list) — weld a list into one string
Note the surprising shape: the separator goes on the left, the list goes inside the brackets. Read it as "put this separator between each item of this list".
drinks = ["milo", "kopi", "teh tarik"] print(", ".join(drinks)) # → milo, kopi, teh tarik print(" - ".join(drinks)) # → milo - kopi - teh tarik print("".join(drinks)) # → milokopiteh tarik print("\n".join(drinks)) # → milo # kopi # teh tarik
Writing drinks.join(", ") looks natural but it's the wrong way round — and Python crashes with AttributeError: 'list' object has no attribute 'join'. The method belongs to the string (the separator), not to the list. Always type the separator first.
3 · .strip() — trim the rubbish
Removes whitespace (spaces, tabs, newlines) from the start and end of the string. Doesn't touch the middle.
raw = " Aisyah \n" print(raw.strip()) # → Aisyah (no extra spaces, no newline) # Strip a specific character instead print("###Hello###".strip("#")) # → Hello
Two close cousins: .lstrip() trims only the left (leading) side; .rstrip() trims only the right.
4 · .find(needle) — where is it?
Returns the position of the first occurrence of needle — or -1 if it isn't there at all.
sentence = "the quick brown fox" print(sentence.find("quick")) # → 4 print(sentence.find("brown")) # → 10 print(sentence.find("rabbit")) # → -1 (not found!)
Don't test .find()'s return value with if sentence.find("x"): — that's falsy when the match is at position 0! Always compare explicitly:
pos = sentence.find("quick") if pos != -1: print("Found at index", pos) else: print("Not in the sentence.")
Or, for a plain yes/no, use the operator in instead: if "quick" in sentence:.
The cheat-sheet
Question Method "Break this string into pieces" text.split(sep) "Glue these pieces into one string" sep.join(list_of_strings) "Trim leading/trailing spaces" text.strip() "Where is X inside this string?" text.find(x) (-1 = not found) "Is X in here at all?" x in text (True / False)
Worked Example · The Comma-Separated Order Form
12 minThe story
A teacher posts orders on a sticky note as one long string — names separated by semicolons, with sloppy spacing. You need to clean each name, count them, and print a tidy comma-separated rota.
Save as orders.py:
Code
# orders.py — split, strip, join raw = " Aisyah ; Wei Jie ; Priya ; Iman ; Aizat " # 1 — split on the semicolon parts = raw.split(";") print("After split :", parts) # → [' Aisyah ', ' Wei Jie ', ' Priya ', ' Iman ', ' Aizat '] # 2 — strip every piece (loop) clean = [] for p in parts: clean.append(p.strip()) print("After strip :", clean) # → ['Aisyah', 'Wei Jie', 'Priya', 'Iman', 'Aizat'] # 3 — back together, comma-separated tidy = ", ".join(clean) print("Tidy rota :", tidy) # 4 — quick question print("Total people :", len(clean)) print("Is Priya in? :", "Priya" in clean)
Output
After split : [' Aisyah ', ' Wei Jie ', ' Priya ', ' Iman ', ' Aizat '] After strip : ['Aisyah', 'Wei Jie', 'Priya', 'Iman', 'Aizat'] Tidy rota : Aisyah, Wei Jie, Priya, Iman, Aizat Total people : 5 Is Priya in? : True
Read the diff
One string in, one string out — but in the middle it briefly became a list. split → strip → join is the most common text-cleanup pipeline you'll ever write. Memorise that shape.
If you're comfortable with list comprehensions (PY-L2-02), the loop in step 2 collapses to one line:
clean = [p.strip() for p in raw.split(";")] tidy = ", ".join(clean)
Two lines for the whole pipeline. That's the kind of code real Python projects are full of.
Try It Yourself
13 minAsk the user for a sentence with input(). Print how many words it contains, using .split() with no argument.
Hint
text = input("Sentence: ") words = text.split() print("Word count:", len(words))
No-arg .split() handles whatever whitespace the user types — even if they accidentally double-space.
Take the list ["Kuala", "Lumpur", "Malaysia"]. Print it joined with spaces, then with hyphens, then with newlines.
Hint
parts = ["Kuala", "Lumpur", "Malaysia"] print(" ".join(parts)) # → Kuala Lumpur Malaysia print("-".join(parts)) # → Kuala-Lumpur-Malaysia print("\n".join(parts)) # → on three lines
Given names = "Aisyah binti Hassan, Wei Jie Tan, Priya Kumar", build a list of initials — "A.", "W.", "P." — and print them joined by spaces.
Hint
names = "Aisyah binti Hassan, Wei Jie Tan, Priya Kumar" people = [p.strip() for p in names.split(",")] initials = [p[0] + "." for p in people] print(" ".join(initials)) # → A. W. P.
Three pipeline steps: split the big string into people, then a strip on each, then build the initials list with the first character of each name. .join ties it all together for printing.
Mini-Challenge · The Tag Cleaner
8 minYou've scraped tags off the internet. They're a horrible mess — leading hashes, extra spaces, mixed case, duplicates. Build clean_tags.py that tidies them.
raw = " #PYTHON , #beginner;# Python ; #LEVEL2 ,#WORDS ; #beginner "
Your file must:
- Replace every
;with,using.replace(";", ",")(a method we met in PY-L1-23 — same family). - Split on commas.
- For each piece: strip whitespace, strip leading
"#", lower-case it. - Drop empty pieces (strings that became
""after cleaning). - Drop duplicates using a
set. - Print the cleaned tags joined by
", "with a leading"#"on each — like#python, #beginner, #level2, #words.
Stretch goal. Print the original tag count and the cleaned count side by side, like {"raw": 6, "clean": 4}.
Show one possible solution
# clean_tags.py — tidy a messy tag string raw = " #PYTHON , #beginner;# Python ; #LEVEL2 ,#WORDS ; #beginner " normalised = raw.replace(";", ",") pieces = normalised.split(",") clean = [] for p in pieces: p = p.strip() p = p.lstrip("#") p = p.strip() # strip again in case "# Python" -> " Python" p = p.lower() if p != "": clean.append(p) # Drop duplicates while keeping order seen = set() unique = [] for t in clean: if t not in seen: unique.append(t) seen.add(t) # Print joined with hash prefix hashed = ["#" + t for t in unique] print(", ".join(hashed)) # Stretch print({"raw": len(pieces), "clean": len(unique)})
Non-negotiables: .replace, .split, a loop that .strip()s and .lstrip("#")s each piece, a set-based dedupe and one .join at the end. This is the most realistic mini-pipeline you've written so far.
Recap
3 minFour string methods unlock most real text work. .split(sep) chops a string into a list. "sep".join(list) welds a list back into a string. .strip() trims unwanted whitespace from both ends. .find(needle) tells you where a substring sits — but watch the "found at 0" trap; prefer x in text for a clean yes/no. The common pipeline split → strip → join is the bread and butter of every text-cleaning script.
Vocabulary Card
- .split(sep)
- Cut a string at every
sep. With no argument, splits on any run of whitespace. - "sep".join(list)
- Glue the items of
listwithsepbetween them. The separator goes on the left of.join. - .strip() / .lstrip() / .rstrip()
- Trim whitespace (or a chosen character) from both / left / right.
- .find(needle)
- Index of the first match, or
-1if not found. - x in text
- Pure yes/no membership — simpler than
.find() != -1when you don't need the position.
Homework
4 minSave csv_to_pretty.py. Given the comma-separated string below, print a tidy markdown-style table.
raw = "name,age,city\nAisyah, 12 , Kuala Lumpur \nWei Jie ,13,Penang\nPriya,11, Ipoh"
Your file must:
- Split the string on
"\\n"to get four lines. - For each line, split on
","and strip each cell — using a list comprehension is fine. - Print each row joined with
" | ". Add a divider"----|-----|------"between the header and the data rows.
Stretch. Find the row containing "Penang" using in and print just that line on its own at the end.
Sample · csv_to_pretty.py
# csv_to_pretty.py — CSV-ish string to a markdown table raw = "name,age,city\nAisyah, 12 , Kuala Lumpur \nWei Jie ,13,Penang\nPriya,11, Ipoh" lines = raw.split("\n") rows = [] for line in lines: cells = [c.strip() for c in line.split(",")] rows.append(cells) # Print header print(" | ".join(rows[0])) print("----|-----|------") # Print body for row in rows[1:]: print(" | ".join(row)) # Stretch — find Penang line for row in rows[1:]: if "Penang" in row: print() print("Penang row:", " | ".join(row))
Non-negotiables: outer .split("\\n"), inner .split(",") with .strip() on each cell, and " | ".join for the printed line. Real CSV parsers use the csv module — you'll meet it in Level 4 — but this two-split shape handles a lot.