PY-L7-27 · Web Automation: Playwright (Modern Alternative)

Learning Goals

3 min

By the end of this lesson you can:

Install Playwright and its bundled browsers.
Drive a page with the sync API: navigate, click, fill, read.
Use auto-waiting locators and never write a manual wait again.
Decide between Playwright and Selenium for a given task.

Warm-Up · What Selenium Made You Do

5 min

Recall the Selenium dance: import By, WebDriverWait, expected_conditions; wrap every interaction in an explicit wait; remember to quit(). Powerful, but verbose — and forgetting a wait makes tests flaky.

pip install playwright
playwright install        # downloads bundled Chromium/Firefox/WebKit

Today's big idea

Playwright's locators auto-wait: when you click or read a locator, it automatically waits for the element to be present, visible, and actionable first. No WebDriverWait, no flakiness. It also bundles its own browsers (no driver-version headaches) and uses a context-manager API that cleans up for you. Same concepts as Selenium, far less ceremony.

New Concept · The Playwright API

14 min

Launch with a context manager

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://example.com")
    print(page.title())
    browser.close()

The with block manages startup/teardown. headless=True by default; set headless=False to watch it run while developing. Three browser engines are built in: p.chromium, p.firefox, p.webkit.

Locators that auto-wait

# build a locator — nothing happens yet
heading = page.locator("h1")
search  = page.get_by_role("textbox", name="Search")
button  = page.get_by_role("button", name="Submit")

# acting on it auto-waits for the element to be ready:
button.click()                  # waits until clickable, then clicks
search.fill("playwright")       # waits, clears, types
print(heading.inner_text())     # waits until present, then reads

A locator is a recipe for finding an element, re-evaluated each time you use it. The action (click, fill) waits for the element to be actionable — so timing bugs largely disappear.

User-facing selectors (recommended)

page.get_by_role("button", name="Login")     # by accessible role + name
page.get_by_text("Add to cart")              # by visible text
page.get_by_label("Email")                   # form field by its label
page.get_by_placeholder("Search…")           # by placeholder
page.locator("#results .item")               # CSS, when needed

Playwright nudges you toward selectors a user would recognise (role, text, label). These are robust (Lesson 23's lesson again) and double as accessibility checks.

Explicit assertions and waits (when you do need them)

from playwright.sync_api import expect

expect(page.locator(".result")).to_be_visible()      # auto-retries up to a timeout
expect(page.locator(".count")).to_have_text("3 found")

page.wait_for_url("**/dashboard")                     # wait for navigation
page.wait_for_load_state("networkidle")               # wait for network to settle

expect(...) retries the assertion until it passes or times out — perfect for "wait until the result appears." You rarely need raw waits.

Handy extras

page.screenshot(path="shot.png", full_page=True)
page.pdf(path="page.pdf")                  # save the page as PDF (chromium)
content = page.content()                   # rendered HTML → BeautifulSoup
page.goto(url, wait_until="domcontentloaded")

Worked Example · Same Scrape, Less Code

12 min

Goal: the same JS-rendered quote scrape from Lesson 26, now in Playwright — notice how the auto-waiting removes the explicit-wait boilerplate.

from playwright.sync_api import sync_playwright
import csv, logging

logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s")
log = logging.getLogger("playwright")

def scrape_quotes() -> list[dict]:
    quotes = []
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto("https://quotes.toscrape.com/js/")

        # locator auto-waits for the quotes to render — no WebDriverWait needed
        cards = page.locator(".quote")
        cards.first.wait_for()              # ensure at least one exists
        count = cards.count()
        log.info("found %d quotes", count)

        for i in range(count):
            card = cards.nth(i)
            quotes.append({
                "text": card.locator(".text").inner_text(),
                "author": card.locator(".author").inner_text(),
            })
        browser.close()
    return quotes

rows = scrape_quotes()
with open("quotes.csv", "w", newline="", encoding="utf-8") as f:
    w = csv.DictWriter(f, fieldnames=["text", "author"])
    w.writeheader(); w.writerows(rows)
log.info("wrote %d quotes", len(rows))

INFO found 10 quotes
INFO wrote 10 quotes

Read the code

Compare with Lesson 26: no By, no WebDriverWait/expected_conditions imports, no try/finally (the with block closes the browser). The cards.first.wait_for() is the only explicit wait, and even that's often unnecessary because acting on a locator auto-waits. cards.nth(i) indexes the matched set. Same result, noticeably less ceremony — which is why Playwright has become many teams' default for new browser automation.

Playwright vs. Selenium — choosing

Playwright: new projects, flaky-test pain, modern apps, want bundled browsers and async support. Selenium: existing test suites, the widest language/grid ecosystem, or a tool/integration that mandates it. Both control real browsers; the concepts you learned in Lesson 26 transfer directly. Pick one and go — don't agonise.

Try It Yourself

13 min

01 🟢 Title & screenshot

Open a site headless, print its title, and save a full-page screenshot — all inside a with sync_playwright() block. No manual cleanup needed.

02 🟡 Fill and submit by role

On a search or login page, locate fields with get_by_label/get_by_role, fill them, click the submit button, and assert the result with expect(...).to_be_visible().

Hint

from playwright.sync_api import expect
page.get_by_label("Username").fill("demo")
page.get_by_label("Password").fill("secret")
page.get_by_role("button", name="Login").click()
expect(page.get_by_text("Welcome")).to_be_visible()

03 🔴 Render to PDF

Open a content page, wait for it to settle (wait_for_load_state("networkidle")), and save it as a PDF with page.pdf(...). Combine with Lesson 21 ideas — a webpage archived as a document.

Hint

page.goto("https://example.com")
page.wait_for_load_state("networkidle")
page.pdf(path="example.pdf", format="A4")

Mini-Challenge · The Multi-Page Crawler

8 min

Write a Playwright crawler that scrapes a paginated practice site (e.g. quotes.toscrape.com), clicking the "Next" button with a locator until it's no longer visible, collecting all items across pages into one CSV. Let auto-waiting handle the page transitions.

Show a sample solution

from playwright.sync_api import sync_playwright
import csv

def crawl() -> list[dict]:
    rows = []
    with sync_playwright() as p:
        page = p.chromium.launch(headless=True).new_page()
        page.goto("https://quotes.toscrape.com/")
        while True:
            cards = page.locator(".quote")
            for i in range(cards.count()):
                c = cards.nth(i)
                rows.append({
                    "text": c.locator(".text").inner_text(),
                    "author": c.locator(".author").inner_text(),
                })
            nxt = page.locator("li.next a")
            if nxt.count() == 0:
                break
            nxt.click()                     # auto-waits for the next page
            page.wait_for_load_state("domcontentloaded")
    return rows

data = crawl()
with open("all_quotes.csv", "w", newline="", encoding="utf-8") as f:
    w = csv.DictWriter(f, fieldnames=["text", "author"])
    w.writeheader(); w.writerows(data)
print(f"crawled {len(data)} quotes")

Non-negotiables: locator-driven pagination, stops when Next disappears, all pages into one CSV.

Recap

3 min

Playwright is the modern take on browser automation: with sync_playwright() manages the browser, bundled engines avoid driver headaches, and locators auto-wait for elements to be ready — so flaky timing bugs mostly vanish and you rarely write an explicit wait. Prefer user-facing selectors (get_by_role, get_by_text, get_by_label), and use expect(...) for retrying assertions. The concepts are identical to Selenium (Lesson 26) with far less ceremony. Choose Playwright for new work and flaky-test relief; Selenium for legacy suites and the widest ecosystem.

Vocabulary Card

locator: A re-evaluated recipe for an element; actions on it auto-wait.
auto-waiting: Playwright waits for an element to be actionable before acting.
get_by_role: Selecting elements the way a user/screen-reader perceives them.
expect(): An assertion that retries until it passes or times out.

Homework

4 min

Port your Lesson 26 Selenium homework to Playwright. Then write a short note comparing the two: lines of code, how each handled waiting, and which you'd choose for a new project and why. Bonus: add a feature Playwright makes easy (PDF export, multi-browser run, or get_by_role selectors).

Sample · port + comparison

from playwright.sync_api import sync_playwright
import csv

with sync_playwright() as p:
    page = p.chromium.launch(headless=True).new_page()
    page.goto("https://quotes.toscrape.com/js/")
    cards = page.locator(".quote")
    cards.first.wait_for()
    rows = [{
        "text": cards.nth(i).locator(".text").inner_text(),
        "author": cards.nth(i).locator(".author").inner_text(),
    } for i in range(cards.count())]
    page.screenshot(path="page.png", full_page=True)

with open("jobs.csv", "w", newline="", encoding="utf-8") as f:
    w = csv.DictWriter(f, fieldnames=["text", "author"])
    w.writeheader(); w.writerows(rows)

Comparison:
- Playwright: ~15 lines, no By/WebDriverWait/EC imports, no
  try/finally (with-block closes the browser), waiting is automatic.
- Selenium: ~25 lines, manual explicit waits, manual quit().
- New project → Playwright (less flakiness, bundled browsers).
- Legacy suite / Selenium Grid → stay on Selenium.

Non-negotiables: a working port, a Playwright-only feature, and a concrete comparison.

from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch(headless=True) page = browser.new_page() page.goto("https://example.com") print(page.title()) browser.close()

# build a locator — nothing happens yet heading = page.locator("h1") search = page.get_by_role("textbox", name="Search") button = page.get_by_role("button", name="Submit") # acting on it auto-waits for the element to be ready: button.click() # waits until clickable, then clicks search.fill("playwright") # waits, clears, types print(heading.inner_text()) # waits until present, then reads

page.get_by_role("button", name="Login") # by accessible role + name page.get_by_text("Add to cart") # by visible text page.get_by_label("Email") # form field by its label page.get_by_placeholder("Search…") # by placeholder page.locator("#results .item") # CSS, when needed

from playwright.sync_api import expect expect(page.locator(".result")).to_be_visible() # auto-retries up to a timeout expect(page.locator(".count")).to_have_text("3 found") page.wait_for_url("**/dashboard") # wait for navigation page.wait_for_load_state("networkidle") # wait for network to settle

page.screenshot(path="shot.png", full_page=True) page.pdf(path="page.pdf") # save the page as PDF (chromium) content = page.content() # rendered HTML → BeautifulSoup page.goto(url, wait_until="domcontentloaded")

from playwright.sync_api import sync_playwright import csv, logging logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") log = logging.getLogger("playwright") def scrape_quotes() -> list[dict]: quotes = [] with sync_playwright() as p: browser = p.chromium.launch(headless=True) page = browser.new_page() page.goto("https://quotes.toscrape.com/js/") # locator auto-waits for the quotes to render — no WebDriverWait needed cards = page.locator(".quote") cards.first.wait_for() # ensure at least one exists count = cards.count() log.info("found %d quotes", count) for i in range(count): card = cards.nth(i) quotes.append({ "text": card.locator(".text").inner_text(), "author": card.locator(".author").inner_text(), }) browser.close() return quotes rows = scrape_quotes() with open("quotes.csv", "w", newline="", encoding="utf-8") as f: w = csv.DictWriter(f, fieldnames=["text", "author"]) w.writeheader(); w.writerows(rows) log.info("wrote %d quotes", len(rows))

from playwright.sync_api import expect page.get_by_label("Username").fill("demo") page.get_by_label("Password").fill("secret") page.get_by_role("button", name="Login").click() expect(page.get_by_text("Welcome")).to_be_visible()

from playwright.sync_api import sync_playwright import csv def crawl() -> list[dict]: rows = [] with sync_playwright() as p: page = p.chromium.launch(headless=True).new_page() page.goto("https://quotes.toscrape.com/") while True: cards = page.locator(".quote") for i in range(cards.count()): c = cards.nth(i) rows.append({ "text": c.locator(".text").inner_text(), "author": c.locator(".author").inner_text(), }) nxt = page.locator("li.next a") if nxt.count() == 0: break nxt.click() # auto-waits for the next page page.wait_for_load_state("domcontentloaded") return rows data = crawl() with open("all_quotes.csv", "w", newline="", encoding="utf-8") as f: w = csv.DictWriter(f, fieldnames=["text", "author"]) w.writeheader(); w.writerows(data) print(f"crawled {len(data)} quotes")