PY-L7-28 · Project: Auto-Fill Form Bot

The Brief

3 min

Build formbot.py <data.csv> <form-url> that, for each row in the CSV, opens a web form, fills the fields from the row, submits it, confirms success, and records the outcome — producing a results.csv of which rows succeeded and which failed and why.

Read input rows (Lesson 15) and map columns to form fields.
Fill & submit with Playwright locators (Lesson 27), waiting for confirmation.
Handle a failure on one row without sinking the rest (Lesson 13 logging).
Write a results report and a screenshot of any failure.

⚠️ Only automate forms you're allowed to

Automate your own forms, sanctioned internal tools, or explicit practice sandboxes (e.g. httpbin's form, or a test form you build). Never auto-submit to third-party sites without permission, never create spam, and never bypass CAPTCHAs or rate limits meant to stop bots. Automation is a force multiplier — point it only where you have the right to.

Design the Flow

5 min

for each row in data.csv:
   1. open the form URL (fresh page each time = clean state)
   2. fill each field by its label/name from the row's values
   3. click submit
   4. wait for a success indicator (text, URL change, element)
   5. record success/failure (+ screenshot on failure)
write results.csv  ·  log a summary

Robustness first

Real forms break in real ways: a field renamed, a slow response, a validation error. The bot must treat each row as an independent transaction — one bad row gets logged and skipped, the batch carries on. That isolation is the whole point of a good bot.

Build It · The Core Bot

14 min

A field-mapping config

Decouple your CSV columns from the form's field names so the bot adapts to any form via config, not code edits:

# maps a CSV column → how to locate the form field
FIELD_MAP = {
    "name":    {"by": "label", "target": "Full name"},
    "email":   {"by": "label", "target": "Email"},
    "message": {"by": "label", "target": "Message"},
}

SUBMIT = {"by": "role", "target": "Submit"}
SUCCESS_TEXT = "Thank you"        # what appears after a good submit

Filling one field

from playwright.sync_api import Page

def locate(page: Page, spec: dict):
    by, target = spec["by"], spec["target"]
    if by == "label":
        return page.get_by_label(target)
    if by == "role":
        return page.get_by_role("button", name=target)
    if by == "css":
        return page.locator(target)
    raise ValueError(f"unknown locator type: {by}")

def fill_row(page: Page, row: dict) -> None:
    for column, spec in FIELD_MAP.items():
        value = row.get(column, "")
        locate(page, spec).fill(value)

The locate helper supports several selector strategies; fill_row just walks the map. Adding a field is a config line, not new code.

Submitting one row, safely

from playwright.sync_api import expect, TimeoutError as PWTimeout

def submit_row(page: Page, row: dict, url: str) -> tuple[bool, str]:
    try:
        page.goto(url)
        fill_row(page, row)
        locate(page, SUBMIT).click()
        expect(page.get_by_text(SUCCESS_TEXT)).to_be_visible(timeout=10000)
        return True, "ok"
    except PWTimeout:
        return False, "no success confirmation (timeout)"
    except Exception as e:               # any other failure for this row
        return False, f"{type(e).__name__}: {e}"

Each row is a try/except transaction returning (success, reason). A timeout waiting for the success text is treated as failure; so is any other exception. Crucially, it returns rather than raising — so the batch loop keeps going.

Build It · The Batch Runner

12 min

Wire the pieces into a runner that processes the whole CSV, screenshots failures, and writes a results report.

import csv, argparse, logging
from pathlib import Path
from playwright.sync_api import sync_playwright

logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s")
log = logging.getLogger("formbot")

def run(data_path: str, url: str, out: str = "results.csv") -> None:
    rows = list(csv.DictReader(open(data_path, newline="", encoding="utf-8")))
    log.info("loaded %d rows", len(rows))
    Path("failures").mkdir(exist_ok=True)
    results = []

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        for i, row in enumerate(rows, start=1):
            ok, reason = submit_row(page, row, url)
            if not ok:
                page.screenshot(path=f"failures/row-{i}.png")
                log.warning("row %d FAILED: %s", i, reason)
            else:
                log.info("row %d ok", i)
            results.append({**row, "status": "ok" if ok else "failed",
                            "reason": reason})
        browser.close()

    fields = list(results[0].keys())
    with open(out, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fields)
        w.writeheader(); w.writerows(results)

    ok_count = sum(r["status"] == "ok" for r in results)
    log.info("done: %d ok, %d failed → %s",
             ok_count, len(results) - ok_count, out)

if __name__ == "__main__":
    ap = argparse.ArgumentParser(description="Fill a web form from a CSV.")
    ap.add_argument("data"); ap.add_argument("url")
    ap.add_argument("--out", default="results.csv")
    a = ap.parse_args()
    run(a.data, a.url, a.out)

INFO loaded 25 rows
INFO row 1 ok
INFO row 2 ok
WARNING row 3 FAILED: no success confirmation (timeout)
INFO row 4 ok
...
INFO done: 23 ok, 2 failed → results.csv

Read the result

The runner reuses one browser/page across rows (faster), but re-navigates per row for clean state. Failures get a screenshot in failures/ so a human can see exactly what the form looked like when it broke — invaluable for debugging. The results.csv echoes every input row with a status and reason, so you know precisely which 2 of 25 need attention. This is a genuine data-entry automation: 25 forms submitted and audited in the time it takes to read this paragraph.

Build It Yourself

13 min

Make your own practice form first (a tiny HTML file with name/email/message + a submit that shows "Thank you"), or use httpbin's form at httpbin.org/forms/post. Point the bot at that — never a real third-party form.

01 🟢 One row, by hand

Get the bot submitting a single hard-coded row to your practice form and confirming success. Watch it run with headless=False.

02 🟡 Loop the CSV

Feed it a 5-row CSV and produce results.csv. Deliberately put one bad row (e.g. missing a required field) and confirm it's logged as failed while the others succeed.

03 🔴 Dropdowns & checkboxes

Extend locate/fill_row to handle a <select> dropdown (select_option) and a checkbox (check/uncheck), driven by extra entries in FIELD_MAP.

Hint

# in FIELD_MAP:
#   "plan": {"by": "label", "target": "Plan", "type": "select"},
#   "agree": {"by": "label", "target": "I agree", "type": "checkbox"},
def fill_field(loc, value, spec):
    kind = spec.get("type", "text")
    if kind == "select":
        loc.select_option(value)
    elif kind == "checkbox":
        loc.check() if value.lower() in ("yes", "true", "1") else loc.uncheck()
    else:
        loc.fill(value)

Stretch · Resume & Retry

8 min

Make the bot resumable: if it's interrupted, a re-run should skip rows already marked ok in results.csv and only attempt the unfinished/failed ones. Add a --retry-failed flag that re-attempts previously failed rows. This is real production hygiene (and a preview of Lesson 45).

Show the key additions

import csv
from pathlib import Path

def load_done(out: str) -> set[int]:
    if not Path(out).exists():
        return set()
    done = set()
    for i, r in enumerate(csv.DictReader(open(out, newline="",
                          encoding="utf-8")), start=1):
        if r.get("status") == "ok":
            done.add(i)
    return done

# in the runner:
done = load_done(out)
for i, row in enumerate(rows, start=1):
    if i in done:
        log.info("row %d already done — skipping", i)
        continue
    # …submit as before…

Non-negotiables: skips already-ok rows on re-run, --retry-failed re-attempts failures, results stay consistent.

Recap

3 min

You built a real form bot by composing the level's skills: CSV input (15), Playwright filling and submitting with auto-waiting locators (27), and per-row error handling with logging and failure screenshots (13). The architecture matters most — a field-map config keeps it adaptable, and treating each row as an isolated transaction (return success/reason, never crash the batch) makes it robust. Always within ethical bounds: your own forms, sanctioned tools, or sandboxes only. Resume/retry turns it from a script into something production-grade.

Vocabulary Card

field map: Config linking data columns to form-field locators, decoupling data from code.
transaction isolation: Each row succeeds or fails independently without affecting others.
success indicator: The signal (text/URL/element) confirming a submit worked.
resumability: Re-running skips completed work and only attempts what's left.

Homework

4 min

Finish the form bot against a practice form you control. Include: a field-map config, per-row error handling with failure screenshots, a results.csv report, and at least one stretch feature (dropdown/checkbox support or resume/retry). Write two sentences naming the form you targeted and confirming you have the right to automate it.

Sample · what a complete submission shows

Target: a local test_form.html I wrote (name/email/message,
        submit shows "Thank you"). I own it — safe to automate.

Run:    python formbot.py contacts.csv file:///.../test_form.html
        INFO loaded 5 rows
        INFO row 1-4 ok
        WARNING row 5 FAILED: no success confirmation (timeout)
        INFO done: 4 ok, 1 failed → results.csv
Files:  results.csv (status+reason per row),
        failures/row-5.png (screenshot of the broken submit).
Stretch: --retry-failed re-attempts only the failed row.

Non-negotiables: field map, per-row handling + screenshots, results report, a stretch feature, an ethics statement.

for each row in data.csv: 1. open the form URL (fresh page each time = clean state) 2. fill each field by its label/name from the row's values 3. click submit 4. wait for a success indicator (text, URL change, element) 5. record success/failure (+ screenshot on failure) write results.csv · log a summary

# maps a CSV column → how to locate the form field FIELD_MAP = { "name": {"by": "label", "target": "Full name"}, "email": {"by": "label", "target": "Email"}, "message": {"by": "label", "target": "Message"}, } SUBMIT = {"by": "role", "target": "Submit"} SUCCESS_TEXT = "Thank you" # what appears after a good submit

from playwright.sync_api import Page def locate(page: Page, spec: dict): by, target = spec["by"], spec["target"] if by == "label": return page.get_by_label(target) if by == "role": return page.get_by_role("button", name=target) if by == "css": return page.locator(target) raise ValueError(f"unknown locator type: {by}") def fill_row(page: Page, row: dict) -> None: for column, spec in FIELD_MAP.items(): value = row.get(column, "") locate(page, spec).fill(value)

from playwright.sync_api import expect, TimeoutError as PWTimeout def submit_row(page: Page, row: dict, url: str) -> tuple[bool, str]: try: page.goto(url) fill_row(page, row) locate(page, SUBMIT).click() expect(page.get_by_text(SUCCESS_TEXT)).to_be_visible(timeout=10000) return True, "ok" except PWTimeout: return False, "no success confirmation (timeout)" except Exception as e: # any other failure for this row return False, f"{type(e).__name__}: {e}"

import csv, argparse, logging from pathlib import Path from playwright.sync_api import sync_playwright logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") log = logging.getLogger("formbot") def run(data_path: str, url: str, out: str = "results.csv") -> None: rows = list(csv.DictReader(open(data_path, newline="", encoding="utf-8"))) log.info("loaded %d rows", len(rows)) Path("failures").mkdir(exist_ok=True) results = [] with sync_playwright() as p: browser = p.chromium.launch(headless=True) page = browser.new_page() for i, row in enumerate(rows, start=1): ok, reason = submit_row(page, row, url) if not ok: page.screenshot(path=f"failures/row-{i}.png") log.warning("row %d FAILED: %s", i, reason) else: log.info("row %d ok", i) results.append({**row, "status": "ok" if ok else "failed", "reason": reason}) browser.close() fields = list(results[0].keys()) with open(out, "w", newline="", encoding="utf-8") as f: w = csv.DictWriter(f, fieldnames=fields) w.writeheader(); w.writerows(results) ok_count = sum(r["status"] == "ok" for r in results) log.info("done: %d ok, %d failed → %s", ok_count, len(results) - ok_count, out) if __name__ == "__main__": ap = argparse.ArgumentParser(description="Fill a web form from a CSV.") ap.add_argument("data"); ap.add_argument("url") ap.add_argument("--out", default="results.csv") a = ap.parse_args() run(a.data, a.url, a.out)

# in FIELD_MAP: # "plan": {"by": "label", "target": "Plan", "type": "select"}, # "agree": {"by": "label", "target": "I agree", "type": "checkbox"}, def fill_field(loc, value, spec): kind = spec.get("type", "text") if kind == "select": loc.select_option(value) elif kind == "checkbox": loc.check() if value.lower() in ("yes", "true", "1") else loc.uncheck() else: loc.fill(value)

import csv from pathlib import Path def load_done(out: str) -> set[int]: if not Path(out).exists(): return set() done = set() for i, r in enumerate(csv.DictReader(open(out, newline="", encoding="utf-8")), start=1): if r.get("status") == "ok": done.add(i) return done # in the runner: done = load_done(out) for i, row in enumerate(rows, start=1): if i in done: log.info("row %d already done — skipping", i) continue # …submit as before…