PY-L2-42 · Regex 101 — re.findall and re.search

Regex 101 — re.findall and re.search

Find every phone number in a wall of text. Pull out every #hashtag from a tweet. Spot the first URL in an email. That's what regular expressions do — pattern-matching for strings. Today: the two most useful regex calls and the building-block characters.

⏱ 1 hour🔍 Concept lesson📚 After PY-L2-41💻 VS Code or online-python.com

Learning Goals

3 min

By the end of this lesson you can:

Import re and use re.findall(pattern, text) to pull every match into a list.
Use re.search(pattern, text) to test whether any match exists.
Write simple literal patterns and recognise the regex special characters that need escaping.
Use raw strings (r"...") for regex patterns and explain why.

Warm-Up · Find Every Phone Number

5 min

Without regex, finding a phone number in a wall of text means scanning character by character — checking for digits, dashes, the right shape. Tedious. Watch:

import re

text = """
Aisyah: 012-3456789
Wei Jie: 011-2345 678
Priya:   017-9921122
Find me at 014-1112233 or 03-12345678 ext 99.
"""

phones = re.findall(r"\d{2,3}-\d{7,8}", text)
print(phones)
# → ['012-3456789', '017-9921122', '014-1112233', '03-12345678']

One line. \d means "a digit". {2,3} means "2 or 3 of them". - means a literal dash. \d{7,8} means "7 or 8 digits". Together: a phone-shaped pattern.

Today's big idea

A regex is a tiny language for describing string shapes. Once you can describe the shape, Python finds every instance.

New Concept · Two Functions, Five Characters

14 min

The two everyday functions

import re

# 1 — findall: pulls every match into a list
re.findall(r"cat", "the cat saw a cat in the catacomb")
# → ['cat', 'cat', 'cat']

# 2 — search: returns a Match object, or None
m = re.search(r"cat", "where is the cat?")
if m:
    print("Found at index", m.start())
else:
    print("No match.")

Pick by what you want:

Every match as a list → findall.
Just yes/no → search with if m:.
First match + position → search + m.start(), m.group().

Literal patterns

Most characters in a regex mean themselves. r"cat" matches the three letters c-a-t in order.

The first five special characters

Char     Meaning                Example
\d       any digit              "\d\d" matches "42"
\w       letter, digit, _       "\w+" matches "hello42"
.        any char except \n     "h.t" matches "hat", "h2t", "h@t"
*        zero or more of prev    "ab*" matches "a", "ab", "abbb"
+        one or more of prev     "ab+" matches "ab", "abbb" (not "a")

Why raw strings?

Always write your regex as a raw string — r"...". Python normally treats \n as a newline; with the r prefix, it's two characters — backslash and n.

re.findall("\d+", text)      # works, but Python warns about \d
re.findall(r"\d+", text)     # ✅ the safe way

For literal regex characters like \d, \w, \b, raw strings are essential.

Counts · the curly braces

{n}      exactly n
{n,}     at least n
{n,m}    between n and m
?        zero or one of prev  (same as {0,1})
+        one or more          (same as {1,})
*        zero or more         (same as {0,})

re.findall(r"\d{4}", "Year 2026 or 26?")          # → ['2026']  (only 4-digit runs)
re.findall(r"colou?r", "color and colour")          # → ['color', 'colour']
re.findall(r"go+gle", "gogle google gooogle")       # → ['gogle', 'google', 'gooogle']

Character classes

Square brackets list characters that match a single position.

re.findall(r"[aeiou]", "education")     # → ['e', 'u', 'a', 'i', 'o']
re.findall(r"[A-Z]", "Hello World")     # → ['H', 'W']
re.findall(r"[^aeiou ]", "education")   # NOT a vowel or space — ['d', 'c', 't', 'n']

The ^ inside brackets means "not". Useful for "everything except these".

The anchors · ^ and $

Outside of brackets, ^ means "start of the string", $ means "end".

re.search(r"^Aisyah", "Aisyah said hi")    # → match (starts with Aisyah)
re.search(r"^Aisyah", "Hi, Aisyah!")        # → None  (doesn't start with Aisyah)
re.search(r"\.$", "Hi.")                    # → match (ends with a literal dot)

Worked Example · Tweet Extractor

12 min

Save as tweet_parse.py:

# tweet_parse.py — find hashtags, mentions and URLs

import re

tweet = """Lovely day! Visited @aisyah and @wei_jie
for laksa #foodie #penang #malaysia. Recipe at
https://example.com/laksa and also at http://lp.gov.my!
Call 012-3456789 if you want some 😀
"""

# Hashtags — # followed by letters/digits/underscore
hashtags = re.findall(r"#\w+", tweet)
print("Hashtags:", hashtags)

# Mentions — @ followed by letters/digits/underscore
mentions = re.findall(r"@\w+", tweet)
print("Mentions:", mentions)

# Phone numbers — Malaysian style
phones = re.findall(r"\d{2,3}-\d{7,8}", tweet)
print("Phones  :", phones)

# URLs — http(s) followed by non-space characters
urls = re.findall(r"https?://\S+", tweet)
print("URLs    :", urls)

# Quick yes/no
if re.search(r"laksa", tweet):
    print("Found a laksa reference.")

Output

Hashtags: ['#foodie', '#penang', '#malaysia']
Mentions: ['@aisyah', '@wei_jie']
Phones  : ['012-3456789']
URLs    : ['https://example.com/laksa', 'http://lp.gov.my!']
Found a laksa reference.

Notice the URL match includes the trailing exclamation mark — \S+ grabs everything that's not whitespace. We'll tighten the URL pattern in PY-L2-43.

Read the diff

Four different patterns, each one a literal character followed by \w+ or a more specific shape. https? uses ? to mean "optional s" — matches both http and https. \S+ (capital S) means "not whitespace" — perfect when you don't know exactly what comes next.

Try It Yourself

13 min

01 🟢 Every word

Use re.findall(r"\\w+", text) to extract every word from a sentence. Count them.

Hint

import re
text = "Hello, world! It's 2026 already."
words = re.findall(r"\w+", text)
print(words)               # → ['Hello', 'world', 'It', 's', '2026', 'already']
print("Count:", len(words))

Note that \w doesn't include apostrophes, so It's splits into It and s. We'll tighten with a custom character class tomorrow.

02 🟡 Find every email

From the text below, extract every email address. Aim for a pattern like \w+@\w+\.\w+ — "word, at-sign, word, dot, word".

text = "Contact us: aisyah@example.com or weijie123@school.edu.my, not aisyah.com"

Hint

emails = re.findall(r"\w+@\w+\.\w+", text)
print(emails)
# → ['aisyah@example.com', 'weijie123@school.edu.my']

The backslash dot \. matches a literal dot. Without escaping, . means "any character" — too generous here.

03 🔴 Validate an IC number (stretch)

Malaysian IC numbers look like YYMMDD-PB-#### — six digits, dash, two digits, dash, four digits. Write a function is_ic(text) that returns True/False.

Hint

import re

def is_ic(text):
    return bool(re.search(r"^\d{6}-\d{2}-\d{4}$", text.strip()))

print(is_ic("140812-14-3456"))     # → True
print(is_ic("14081214-3456"))      # → False
print(is_ic("140812-14-3456 extra"))  # → False  (because of $ anchor)

^ and $ are the anchors — the pattern must consume the entire string. Without them, re.search would return True for any text containing a valid IC.

Mini-Challenge · The Chat-Log Stats Tool

8 min

Build chat_stats.py. Given a chat log of the format below, print stats:

[14:32] @aisyah: hey are we still meeting? #lunch
[14:33] @wei_jie: yes, at 12:30! my number's 012-3456789 if you need.
[14:35] @priya: ok cool. found a place: https://maps.example.com/x
[14:35] @aisyah: thanks #lunch #penang

Print:

How many messages (lines that contain :).
Every unique participant (use set).
Every hashtag used.
Every URL.
Every phone number.

Show one possible solution

# chat_stats.py — extract structured info from a chat log

import re

log = """[14:32] @aisyah: hey are we still meeting? #lunch
[14:33] @wei_jie: yes, at 12:30! my number's 012-3456789 if you need.
[14:35] @priya: ok cool. found a place: https://maps.example.com/x
[14:35] @aisyah: thanks #lunch #penang"""

mentions = re.findall(r"@\w+", log)
hashtags = re.findall(r"#\w+", log)
phones   = re.findall(r"\d{2,3}-\d{7,8}", log)
urls     = re.findall(r"https?://\S+", log)

print(f"Messages   : {len(log.splitlines())}")
print(f"Participants: {set(mentions)}")
print(f"Hashtags   : {hashtags}")
print(f"Phones     : {phones}")
print(f"URLs       : {urls}")

Non-negotiables: four re.findall calls with appropriate patterns, plus set() on the mentions to find unique people. Your chat-log parser is one regex away from being shippable.

Recap

3 min

Two functions cover most use cases. re.findall returns all matches as a list. re.search returns a Match object (or None) for the first match — use if m: for yes/no. Always use raw strings — r"...". \d is a digit, \w is a word character, . is anything, * is zero-or-more, + is one-or-more. Square brackets list a character set; ^ and $ anchor to start and end.

Vocabulary Card

regex: A short string describing a pattern of other strings.
re.findall(p, t): Every match as a list.
re.search(p, t): The first match — returns a Match object or None.
raw string: r"..." — Python doesn't process backslash escapes. Essential for regex.
\d / \w / .: Digit / word character / any non-newline.
* / + / ?: Zero-or-more / one-or-more / optional.

Homework

4 min

Build password_check.py. Given a password string, report:

Has at least 8 characters?
Has at least one digit? (Use re.search(r"\d", ...).)
Has at least one upper-case letter? (r"[A-Z]")
Has at least one lower-case letter? (r"[a-z]")
Has at least one symbol? (r"[!@#$%^&*]")

Print each check as a ✓ or ✗ and a final "strong / weak" verdict (strong = all five passed).

Sample · password_check.py

# password_check.py — five regex tests

import re

def check(pwd):
    tests = [
        ("length >= 8",     len(pwd) >= 8),
        ("has a digit",     bool(re.search(r"\d", pwd))),
        ("has UPPER",       bool(re.search(r"[A-Z]", pwd))),
        ("has lower",       bool(re.search(r"[a-z]", pwd))),
        ("has symbol",      bool(re.search(r"[!@#$%^&*]", pwd))),
    ]
    for name, ok in tests:
        print(f"  {'✓' if ok else '✗'}  {name}")
    if all(ok for _, ok in tests):
        print("  STRONG")
    else:
        print("  weak")

check(input("Password: "))

Non-negotiables: five separate regex checks, a tick/cross display, and a final strong/weak verdict. bool(re.search(...)) converts the Match-or-None to a clean True/False.

Regex 101 — re.findall and re.search

⏱ 1 hour🔍 Concept lesson📚 After PY-L2-41💻 VS Code or online-python.com

import re text = """ Aisyah: 012-3456789 Wei Jie: 011-2345 678 Priya: 017-9921122 Find me at 014-1112233 or 03-12345678 ext 99. """ phones = re.findall(r"\d{2,3}-\d{7,8}", text) print(phones) # → ['012-3456789', '017-9921122', '014-1112233', '03-12345678']

import re # 1 — findall: pulls every match into a list re.findall(r"cat", "the cat saw a cat in the catacomb") # → ['cat', 'cat', 'cat'] # 2 — search: returns a Match object, or None m = re.search(r"cat", "where is the cat?") if m: print("Found at index", m.start()) else: print("No match.")

Char Meaning Example \d any digit "\d\d" matches "42" \w letter, digit, _ "\w+" matches "hello42" . any char except \n "h.t" matches "hat", "h2t", "h@t" * zero or more of prev "ab*" matches "a", "ab", "abbb" + one or more of prev "ab+" matches "ab", "abbb" (not "a")

re.findall(r"\d{4}", "Year 2026 or 26?") # → ['2026'] (only 4-digit runs) re.findall(r"colou?r", "color and colour") # → ['color', 'colour'] re.findall(r"go+gle", "gogle google gooogle") # → ['gogle', 'google', 'gooogle']

re.findall(r"[aeiou]", "education") # → ['e', 'u', 'a', 'i', 'o'] re.findall(r"[A-Z]", "Hello World") # → ['H', 'W'] re.findall(r"[^aeiou ]", "education") # NOT a vowel or space — ['d', 'c', 't', 'n']

re.search(r"^Aisyah", "Aisyah said hi") # → match (starts with Aisyah) re.search(r"^Aisyah", "Hi, Aisyah!") # → None (doesn't start with Aisyah) re.search(r"\.$", "Hi.") # → match (ends with a literal dot)

# tweet_parse.py — find hashtags, mentions and URLs import re tweet = """Lovely day! Visited @aisyah and @wei_jie for laksa #foodie #penang #malaysia. Recipe at https://example.com/laksa and also at http://lp.gov.my! Call 012-3456789 if you want some 😀 """ # Hashtags — # followed by letters/digits/underscore hashtags = re.findall(r"#\w+", tweet) print("Hashtags:", hashtags) # Mentions — @ followed by letters/digits/underscore mentions = re.findall(r"@\w+", tweet) print("Mentions:", mentions) # Phone numbers — Malaysian style phones = re.findall(r"\d{2,3}-\d{7,8}", tweet) print("Phones :", phones) # URLs — http(s) followed by non-space characters urls = re.findall(r"https?://\S+", tweet) print("URLs :", urls) # Quick yes/no if re.search(r"laksa", tweet): print("Found a laksa reference.")

import re def is_ic(text): return bool(re.search(r"^\d{6}-\d{2}-\d{4}$", text.strip())) print(is_ic("140812-14-3456")) # → True print(is_ic("14081214-3456")) # → False print(is_ic("140812-14-3456 extra")) # → False (because of $ anchor)

[14:32] @aisyah: hey are we still meeting? #lunch [14:33] @wei_jie: yes, at 12:30! my number's 012-3456789 if you need. [14:35] @priya: ok cool. found a place: https://maps.example.com/x [14:35] @aisyah: thanks #lunch #penang

# chat_stats.py — extract structured info from a chat log import re log = """[14:32] @aisyah: hey are we still meeting? #lunch [14:33] @wei_jie: yes, at 12:30! my number's 012-3456789 if you need. [14:35] @priya: ok cool. found a place: https://maps.example.com/x [14:35] @aisyah: thanks #lunch #penang""" mentions = re.findall(r"@\w+", log) hashtags = re.findall(r"#\w+", log) phones = re.findall(r"\d{2,3}-\d{7,8}", log) urls = re.findall(r"https?://\S+", log) print(f"Messages : {len(log.splitlines())}") print(f"Participants: {set(mentions)}") print(f"Hashtags : {hashtags}") print(f"Phones : {phones}") print(f"URLs : {urls}")

Regex 101 — `re.findall` and `re.search`

Learning Goals

Warm-Up · Find Every Phone Number

New Concept · Two Functions, Five Characters

The two everyday functions

Literal patterns

The first five special characters

Why raw strings?

Counts · the curly braces

Character classes

The anchors · ^ and $

Worked Example · Tweet Extractor

Read the diff

Try It Yourself

Mini-Challenge · The Chat-Log Stats Tool

Recap

Vocabulary Card

Homework

Sample · password_check.py

Regex 101 — `re.findall` and `re.search`

Learning Goals

Warm-Up · Find Every Phone Number

New Concept · Two Functions, Five Characters

The two everyday functions

Literal patterns

The first five special characters

Why raw strings?

Counts · the curly braces

Character classes

The anchors · ^ and $

Worked Example · Tweet Extractor

Read the diff

Try It Yourself

Mini-Challenge · The Chat-Log Stats Tool

Recap

Vocabulary Card

Homework

Sample · password_check.py