Learning Goals
3 minBy the end of this lesson you can:
- Launch and quit a Chrome browser controlled by Selenium.
- Find elements with
Bylocators and interact (click, type). - Use explicit waits instead of
sleepfor dynamic content. - Know when to reach for a browser vs. when
requestsis enough.
Warm-Up · When requests Isn't Enough
5 minYou fetch a page with requests and the data you want… isn't there. The HTML has an empty <div id="app"> because the content is rendered by JavaScript after load. requests only sees the initial HTML; it doesn't run JS.
pip install selenium # Selenium 4+ auto-manages the driver
Selenium controls a real browser, so it runs JavaScript, handles logins, and sees exactly what a human sees. The trade-off: it's slower and heavier than requests. Rule of thumb — try requests/API first; reach for a browser only when the content genuinely requires one. And the #1 skill is waiting properly, because the browser and your code run at different speeds.
New Concept · Driving Chrome
14 minLaunch and quit
from selenium import webdriver from selenium.webdriver.chrome.options import Options options = Options() options.add_argument("--headless=new") # run without a visible window driver = webdriver.Chrome(options=options) try: driver.get("https://example.com") print(driver.title) finally: driver.quit() # ALWAYS quit — frees the browser process
Selenium 4 downloads and manages the ChromeDriver for you. --headless=new runs Chrome invisibly (essential for servers). Always quit() in a finally — an orphaned browser leaks memory.
Finding elements
from selenium.webdriver.common.by import By driver.find_element(By.ID, "search") # one element driver.find_element(By.CSS_SELECTOR, ".price") driver.find_element(By.XPATH, "//button[text()='Login']") driver.find_elements(By.TAG_NAME, "a") # many (plural) → list
Same robust-selector advice as scraping (Lesson 23): prefer IDs and stable attributes. find_element returns one (raises if absent); find_elements returns a list (empty if none).
Interacting
box = driver.find_element(By.ID, "search") box.send_keys("python automation") # type into a field box.submit() # or find a button and .click() driver.find_element(By.ID, "login-btn").click() print(driver.find_element(By.CSS_SELECTOR, "h1").text) # read text
Explicit waits — the crucial skill
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC wait = WebDriverWait(driver, 10) # wait up to 10 seconds # wait until an element is present/clickable BEFORE using it result = wait.until(EC.presence_of_element_located((By.ID, "results"))) button = wait.until(EC.element_to_be_clickable((By.ID, "submit"))) button.click()
time.sleep to wait for elementssleep(5) is fragile: too short and the element isn't ready (crash); too long and you waste time on every run. Explicit waits poll until a specific condition is true (element present, clickable, visible) then continue immediately — fast and reliable. This single habit fixes most flaky browser automation.
Reading state
driver.current_url # where we ended up (after redirects) driver.page_source # the fully-rendered HTML (feed to BeautifulSoup!) element.get_attribute("href") # an attribute value element.is_displayed() # visibility check
A powerful combo: let Selenium render the page, then hand driver.page_source to BeautifulSoup (Lesson 23) for the parsing you already know.
Worked Example · Search and Scrape a JS Page
12 minGoal: open a JS-rendered site, type a query, submit, wait for results to load, and extract them — handling the timing correctly. (Practice on quotes.toscrape.com/js, whose content is JS-rendered on purpose.)
from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC import logging logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") log = logging.getLogger("selenium") def scrape_js_quotes() -> list[dict]: options = Options() options.add_argument("--headless=new") driver = webdriver.Chrome(options=options) quotes = [] try: driver.get("https://quotes.toscrape.com/js/") wait = WebDriverWait(driver, 10) # wait for the JS to render at least one .quote wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".quote"))) log.info("page rendered") for card in driver.find_elements(By.CSS_SELECTOR, ".quote"): text = card.find_element(By.CSS_SELECTOR, ".text").text author = card.find_element(By.CSS_SELECTOR, ".author").text quotes.append({"text": text, "author": author}) log.info("extracted %d quotes", len(quotes)) finally: driver.quit() return quotes for q in scrape_js_quotes()[:3]: print(f"{q['author']}: {q['text'][:50]}…")
INFO page rendered INFO extracted 10 quotes Albert Einstein: "The world as we have created it is a pr… J.K. Rowling: "It is our choices, Harry, that show what w… Albert Einstein: "There are only two ways to live your li…
Read the code
The crucial line is the wait.until(...presence_of_element_located...) — without it, the script would read the page before the JavaScript rendered the quotes and find nothing. With it, the code waits exactly as long as needed (no more, no less) then proceeds. The try/finally guarantees the browser closes even if extraction fails. requests couldn't do this page at all; the browser runs the JS that builds it. That's precisely when Selenium earns its weight.
Try It Yourself
13 minLaunch headless Chrome, open a site, print its title and current URL, and quit cleanly in a finally. Confirm no browser process lingers.
On a search page, find the search box, type a query, submit, wait for results, and print the heading or count. Use an explicit wait, not sleep.
Hint
box = driver.find_element(By.NAME, "q") box.send_keys("selenium") box.submit() WebDriverWait(driver, 10).until( EC.presence_of_element_located((By.CSS_SELECTOR, ".result"))) print(len(driver.find_elements(By.CSS_SELECTOR, ".result")), "results")
Let Selenium render a JS page, grab driver.page_source, and parse it with BeautifulSoup to extract data — combining today's lesson with Lesson 23. Compare to scraping the same URL with plain requests (it should come back empty).
Hint
from bs4 import BeautifulSoup driver.get("https://quotes.toscrape.com/js/") WebDriverWait(driver, 10).until( EC.presence_of_element_located((By.CSS_SELECTOR, ".quote"))) soup = BeautifulSoup(driver.page_source, "html.parser") print(len(soup.select(".quote")), "quotes via BeautifulSoup")
Mini-Challenge · The Screenshot Bot
8 minWrite snapshot(urls) that visits each URL in headless Chrome, waits for the body to load, and saves a full-page screenshot named after the domain and timestamp. Useful for visual monitoring of pages over time.
Show a sample solution
from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from datetime import datetime from urllib.parse import urlparse def snapshot(urls: list[str]) -> None: options = Options(); options.add_argument("--headless=new") options.add_argument("--window-size=1280,1024") driver = webdriver.Chrome(options=options) try: for url in urls: driver.get(url) WebDriverWait(driver, 10).until( EC.presence_of_element_located((By.TAG_NAME, "body"))) host = urlparse(url).netloc.replace(".", "_") stamp = datetime.now().strftime("%Y%m%d-%H%M%S") name = f"{host}-{stamp}.png" driver.save_screenshot(name) print("saved", name) finally: driver.quit() snapshot(["https://example.com", "https://quotes.toscrape.com"])
Non-negotiables: headless, explicit wait, per-domain timestamped screenshot, quit in finally.
Recap
3 minSelenium drives a real Chrome, so it handles JavaScript, logins, and clicks that requests can't. Launch with webdriver.Chrome(options=...) (use --headless=new on servers), always quit() in a finally. Find elements with By locators (robust selectors!), interact with send_keys/click/.text, and — most importantly — use explicit waits (WebDriverWait().until(EC...)) instead of sleep to handle timing reliably. Render then hand page_source to BeautifulSoup for parsing. Reach for the browser only when the page truly needs one; otherwise prefer the lighter, faster API/requests path.
Vocabulary Card
- WebDriver
- The object that controls the browser from your code.
- headless
- Running the browser without a visible window (for servers/automation).
- locator (By)
- How you identify an element: ID, CSS selector, XPath, etc.
- explicit wait
- Polling until a condition is met, instead of a fixed sleep.
Homework
4 minBuild jobwatch.py (using a JS-rendered practice site or any site you're permitted to automate): open the page, wait for listings to render, extract title + a second field into a CSV, and screenshot the page. Use explicit waits throughout, quit in a finally, and log each step. Add a comment explaining why this page needs Selenium rather than requests.
Sample · jobwatch.py (core)
# This site renders listings with JavaScript, so requests would # return an empty container — Selenium runs the JS and sees them. import csv, logging from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") log = logging.getLogger("jobwatch") options = Options(); options.add_argument("--headless=new") driver = webdriver.Chrome(options=options) rows = [] try: driver.get("https://quotes.toscrape.com/js/") WebDriverWait(driver, 10).until( EC.presence_of_element_located((By.CSS_SELECTOR, ".quote"))) log.info("rendered") for card in driver.find_elements(By.CSS_SELECTOR, ".quote"): rows.append({ "text": card.find_element(By.CSS_SELECTOR, ".text").text, "author": card.find_element(By.CSS_SELECTOR, ".author").text, }) driver.save_screenshot("page.png") log.info("captured %d rows + screenshot", len(rows)) finally: driver.quit() with open("jobs.csv", "w", newline="", encoding="utf-8") as f: w = csv.DictWriter(f, fieldnames=["text", "author"]) w.writeheader(); w.writerows(rows)
Non-negotiables: explicit waits, CSV + screenshot, quit in finally, logging, a why-Selenium comment.