PY-L8-33 · A07 — XSS: Defence (Sanitisation & CSP)

Learning Goals

3 min

By the end of this lesson you can:

Apply context-aware output escaping — the primary, complete XSS defence.
Sanitise (allow-list) HTML when you must render rich user content.
Set a Content Security Policy that blocks inline/untrusted scripts.
Add HttpOnly/Secure cookies so even a successful XSS can't steal sessions.

Warm-Up · Escape, Don't Blacklist

5 min

Attacker input:  <script>alert(1)</script>
ESCAPED output:  &lt;script&gt;alert(1)&lt;/script&gt;
The browser now DISPLAYS the text "<script>alert(1)</script>" — and
runs nothing. The < and > are shown as characters, not parsed as tags.

Today's big idea

The fix mirrors SQLi: stop input from crossing into the code context. For HTML that means escaping dangerous characters (< > & " ') so they render as text, not markup. This neutralises every XSS vector — script tags, event handlers, everything — because none of it is ever parsed as code. Modern frameworks auto-escape by default; XSS appears mainly when you opt out. Layer a CSP and HttpOnly cookies on top and you have defence in depth.

New Concept · The Defence Layers

14 min

Layer 1 (primary): context-aware output escaping

import html

# escape on OUTPUT, for the HTML context:
safe = html.escape("<script>alert(1)</script>")
print(safe)        # &lt;script&gt;alert(1)&lt;/script&gt;  → renders as text

# In Jinja/Flask, auto-escaping does this for you:
#   {{ comment }}        ← auto-escaped (SAFE)
#   {{ comment | safe }} ← escaping DISABLED (DANGEROUS — avoid)

"Context-aware" matters: HTML body, HTML attribute, JavaScript, and URL contexts need different escaping. Inserting data into a JS string or a URL needs JS/URL escaping, not just HTML escaping. The safest rule: don't put user data into script/style/URL contexts at all; if you must, use a library that escapes per context.

Layer 2: sanitise HTML when you must allow it

# If users need RICH text (bold, links), you can't escape everything.
# Instead, SANITISE: parse the HTML and keep only an ALLOW-LIST of safe tags.
import bleach    # pip install bleach (allow-list HTML sanitiser)

dirty = '<b>hi</b><script>alert(1)</script><a href="javascript:evil()">x</a>'
clean = bleach.clean(dirty,
                     tags=["b", "i", "a", "p"],          # only these tags
                     attributes={"a": ["href"]},          # only href on <a>
                     protocols=["http", "https"])         # no javascript: URLs
print(clean)   # <b>hi</b>alert(1)<a>x</a>  → script removed, bad href stripped

Escaping turns everything to text — wrong if you genuinely need formatting. Sanitisation keeps a strict allow-list of safe tags/attributes and strips the rest. Always allow-list (never blacklist), and use a maintained library (bleach, nh3) — hand-rolled HTML filtering is notoriously bypassable.

Layer 3: Content Security Policy (CSP)

# A CSP header tells the browser what it's allowed to run/load.
# This one blocks inline scripts and only allows scripts from your own origin:
@app.after_request
def set_csp(resp):
    resp.headers["Content-Security-Policy"] = (
        "default-src 'self'; "
        "script-src 'self'; "          # NO inline <script>, no eval
        "object-src 'none'; "
        "base-uri 'self'")
    return resp

A CSP is a powerful second line: even if a payload slips into the page, a strict policy stops the browser from executing inline scripts or loading code from attacker domains. It doesn't replace escaping (it's a safety net), and it requires moving inline scripts to files — but it dramatically reduces XSS impact.

Layer 4: HttpOnly + Secure cookies

# Mark session cookies HttpOnly so JavaScript CAN'T read them:
@app.route("/login")
def login():
    resp = make_response("ok")
    resp.set_cookie("session", token,
                    httponly=True,   # JS can't read it → XSS can't steal it
                    secure=True,     # only sent over HTTPS
                    samesite="Lax")  # CSRF mitigation too
    return resp

HttpOnly means document.cookie can't see the session cookie — so even a successful XSS can't exfiltrate it (the #1 XSS payload, Lesson 32). Secure keeps it off plaintext HTTP; SameSite helps against CSRF. Defence in depth: assume one layer might fail.

The priority order

1. ESCAPE OUTPUT      (primary — use the framework's auto-escaping)
2. SANITISE rich HTML  (only when you must allow formatting; allow-list)
3. CSP                 (safety net if a payload slips through)
4. HttpOnly cookies    (so XSS can't steal sessions even if it runs)
+ validate input at the boundary (defence in depth, not a substitute)

Worked Example · Fix the Vulnerable Guestbook

12 min

Goal: take Lesson 32's XSS-vulnerable guestbook and apply all four layers — then re-post the payload and watch it render as harmless text.

from flask import Flask, request, make_response
import html, bleach

app = Flask(__name__)
comments = []

# Layer 3: a strict CSP on every response
@app.after_request
def csp(resp):
    resp.headers["Content-Security-Policy"] = "default-src 'self'; script-src 'self'"
    return resp

@app.post("/comment")
def add():
    raw = request.form["text"]
    # Layer 2: if allowing rich text, sanitise to an allow-list;
    # if plain text only, escape entirely (Layer 1). Here: plain text.
    comments.append(raw)        # store raw; escape on OUTPUT (best practice)
    return index()

@app.get("/")
def index():
    # Layer 1: ESCAPE on output — the primary fix
    rendered = "".join(f"<div>{html.escape(c)}</div>" for c in comments)
    return f"<!doctype html><h1>Guestbook</h1>{rendered}"

# (If rich text were needed:)
#   safe = bleach.clean(raw, tags=["b","i","a"], attributes={"a":["href"]})
#   ...and render the 'safe' result without re-escaping it.

# Re-post the Lesson 32 payload:
<script>alert('XSS')</script>

# Now the page SHOWS the literal text:
<div>&lt;script&gt;alert('XSS')&lt;/script&gt;</div>
# → the browser displays "<script>alert('XSS')</script>" and runs NOTHING.
# Even if escaping were missed, the CSP (script-src 'self') blocks the
# inline script, and HttpOnly cookies mean it couldn't steal the session.

Read the code

The decisive change is html.escape(c) on output: the attacker's <script> becomes <script>, which the browser renders as visible text, not a tag — the attack from Lesson 32 is dead. The other layers are insurance: the CSP would block the inline script even if you forgot to escape one field, and HttpOnly cookies mean a slipped-through payload still can't steal the session. Note the best-practice choice to store raw, escape on output — so the data stays clean and you escape correctly for each context where it's used. For rich text, swap escaping for bleach sanitisation.

Try It Yourself

13 min

01 🟢 Escape and verify

Add html.escape to your Lesson 32 demo's output and re-run every payload (script tag, onerror, etc.). Confirm they all now render as text. View source to see the escaped entities.

02 🟡 Sanitise rich text

Allow users to post some HTML (bold, links) but use bleach.clean with a strict allow-list. Confirm <b> survives but <script> and javascript: hrefs are stripped.

Hint

import bleach
clean = bleach.clean(user_html, tags=["b","i","em","strong","a","p","br"],
                     attributes={"a": ["href","title"]},
                     protocols=["http","https","mailto"])

03 🔴 Add CSP + HttpOnly and prove the layers

Add a strict CSP and HttpOnly session cookies. Then deliberately re-introduce an unescaped field and confirm the CSP still blocks the inline script (check the browser console for the CSP violation), and that document.cookie can't see the session. This proves defence in depth.

Hint

# With script-src 'self', an injected <script>alert(1)</script> is
# REFUSED by the browser with a console error:
#   "Refused to execute inline script because it violates CSP..."
# And document.cookie won't include the HttpOnly session cookie.
# → two independent layers each stop the attack.

Mini-Challenge · A Reusable Secure-Render Helper

8 min

Build a small module that does the right thing by default: a render_text(s) that escapes, a render_rich(s) that sanitises with an allow-list, and a Flask after_request that sets a strict CSP + secure-cookie defaults. The goal: make the safe path the easy path, so developers don't hand-roll dangerous output.

Show a sample solution

import html, bleach
from markupsafe import Markup

RICH_TAGS = ["b", "i", "em", "strong", "a", "p", "br", "ul", "li", "code"]
RICH_ATTRS = {"a": ["href", "title"]}

def render_text(s: str) -> str:
    """Plain text → fully escaped (no HTML allowed)."""
    return html.escape(s)

def render_rich(s: str):
    """Rich text → sanitised to a safe allow-list."""
    return Markup(bleach.clean(s, tags=RICH_TAGS, attributes=RICH_ATTRS,
                               protocols=["http", "https", "mailto"]))

def install_security(app):
    @app.after_request
    def harden(resp):
        resp.headers.setdefault("Content-Security-Policy",
                                "default-src 'self'; script-src 'self'")
        resp.headers.setdefault("X-Content-Type-Options", "nosniff")
        return resp
    # also: configure app so session cookies are HttpOnly + Secure + SameSite
    app.config.update(SESSION_COOKIE_HTTPONLY=True,
                      SESSION_COOKIE_SECURE=True,
                      SESSION_COOKIE_SAMESITE="Lax")
    return app

Non-negotiables: escape-by-default text render, allow-list rich render, CSP + HttpOnly/Secure/SameSite cookie defaults — the safe path made easy.

Recap

3 min

Beat XSS with layers, in order: (1) escape output per context (the primary fix — let the framework auto-escape; avoid | safe/Markup/innerHTML); (2) sanitise with an allow-list library (bleach/nh3) only when you must allow rich HTML; (3) a strict CSP as a safety net that blocks inline/untrusted scripts; and (4) HttpOnly + Secure + SameSite cookies so a slipped-through payload still can't steal the session. Never blacklist (vectors are endless) — escaping removes the code/data confusion entirely, just like parameterisation did for SQLi. Make the safe path the default path.

Vocabulary Card

output escaping: Converting HTML-special characters to entities so input renders as text.
sanitisation: Allow-listing safe HTML tags/attributes and stripping the rest.
Content Security Policy: A header restricting what scripts/resources the browser may run/load.
HttpOnly cookie: A cookie JavaScript can't read — so XSS can't steal the session.

Homework

4 min

Fully remediate your Lesson 32 XSS demo with all four layers and prove every prior payload now fails (rendered as text; CSP blocks; cookie unreadable). Build the reusable secure-render helper. Write a before/after note: the primary fix in one line, why escaping beats blacklisting, and what each extra layer buys you if the primary fix is ever missed.

Sample · XSS defence note

Primary fix (one line): escape user data on output —
  f"<div>{html.escape(c)}</div>"  (or just let Jinja auto-escape {{ c }}).
Now <script> becomes &lt;script&gt; → shown as text, never executed.

Why escaping beats blacklisting: XSS has dozens of vectors
(<script>, onerror=, <svg onload>, javascript: URLs, ...). A blacklist
can't catch them all and breaks legit input. Escaping removes the
code/data confusion entirely — there's nothing to "filter," the data
just can't be parsed as markup.

What each extra layer buys (if escaping is ever missed on one field):
- bleach sanitisation: safe rich text without opening an XSS hole.
- CSP (script-src 'self'): browser refuses the injected inline script.
- HttpOnly cookies: even a running payload can't read/steal the session.
Defence in depth: I assume one layer will fail someday.

Non-negotiables: all payloads neutralised after the four layers, the reusable helper, and a clear primary-fix + escaping-vs-blacklist + layer-value explanation.

Attacker input: <script>alert(1)</script> ESCAPED output: <script>alert(1)</script> The browser now DISPLAYS the text "<script>alert(1)</script>" — and runs nothing. The < and > are shown as characters, not parsed as tags.

import html # escape on OUTPUT, for the HTML context: safe = html.escape("<script>alert(1)</script>") print(safe) # <script>alert(1)</script> → renders as text # In Jinja/Flask, auto-escaping does this for you: # {{ comment }} ← auto-escaped (SAFE) # {{ comment | safe }} ← escaping DISABLED (DANGEROUS — avoid)

# If users need RICH text (bold, links), you can't escape everything. # Instead, SANITISE: parse the HTML and keep only an ALLOW-LIST of safe tags. import bleach # pip install bleach (allow-list HTML sanitiser) dirty = '<b>hi</b><script>alert(1)</script><a href="javascript:evil()">x</a>' clean = bleach.clean(dirty, tags=["b", "i", "a", "p"], # only these tags attributes={"a": ["href"]}, # only href on <a> protocols=["http", "https"]) # no javascript: URLs print(clean) # <b>hi</b>alert(1)<a>x</a> → script removed, bad href stripped

# A CSP header tells the browser what it's allowed to run/load. # This one blocks inline scripts and only allows scripts from your own origin: @app.after_request def set_csp(resp): resp.headers["Content-Security-Policy"] = ( "default-src 'self'; " "script-src 'self'; " # NO inline <script>, no eval "object-src 'none'; " "base-uri 'self'") return resp

# Mark session cookies HttpOnly so JavaScript CAN'T read them: @app.route("/login") def login(): resp = make_response("ok") resp.set_cookie("session", token, httponly=True, # JS can't read it → XSS can't steal it secure=True, # only sent over HTTPS samesite="Lax") # CSRF mitigation too return resp

1. ESCAPE OUTPUT (primary — use the framework's auto-escaping) 2. SANITISE rich HTML (only when you must allow formatting; allow-list) 3. CSP (safety net if a payload slips through) 4. HttpOnly cookies (so XSS can't steal sessions even if it runs) + validate input at the boundary (defence in depth, not a substitute)

from flask import Flask, request, make_response import html, bleach app = Flask(__name__) comments = [] # Layer 3: a strict CSP on every response @app.after_request def csp(resp): resp.headers["Content-Security-Policy"] = "default-src 'self'; script-src 'self'" return resp @app.post("/comment") def add(): raw = request.form["text"] # Layer 2: if allowing rich text, sanitise to an allow-list; # if plain text only, escape entirely (Layer 1). Here: plain text. comments.append(raw) # store raw; escape on OUTPUT (best practice) return index() @app.get("/") def index(): # Layer 1: ESCAPE on output — the primary fix rendered = "".join(f"<div>{html.escape(c)}</div>" for c in comments) return f"<!doctype html><h1>Guestbook</h1>{rendered}" # (If rich text were needed:) # safe = bleach.clean(raw, tags=["b","i","a"], attributes={"a":["href"]}) # ...and render the 'safe' result without re-escaping it.

# Re-post the Lesson 32 payload: <script>alert('XSS')</script> # Now the page SHOWS the literal text: <div><script>alert('XSS')</script></div> # → the browser displays "<script>alert('XSS')</script>" and runs NOTHING. # Even if escaping were missed, the CSP (script-src 'self') blocks the # inline script, and HttpOnly cookies mean it couldn't steal the session.

import bleach clean = bleach.clean(user_html, tags=["b","i","em","strong","a","p","br"], attributes={"a": ["href","title"]}, protocols=["http","https","mailto"])

# With script-src 'self', an injected <script>alert(1)</script> is # REFUSED by the browser with a console error: # "Refused to execute inline script because it violates CSP..." # And document.cookie won't include the HttpOnly session cookie. # → two independent layers each stop the attack.

import html, bleach from markupsafe import Markup RICH_TAGS = ["b", "i", "em", "strong", "a", "p", "br", "ul", "li", "code"] RICH_ATTRS = {"a": ["href", "title"]} def render_text(s: str) -> str: """Plain text → fully escaped (no HTML allowed).""" return html.escape(s) def render_rich(s: str): """Rich text → sanitised to a safe allow-list.""" return Markup(bleach.clean(s, tags=RICH_TAGS, attributes=RICH_ATTRS, protocols=["http", "https", "mailto"])) def install_security(app): @app.after_request def harden(resp): resp.headers.setdefault("Content-Security-Policy", "default-src 'self'; script-src 'self'") resp.headers.setdefault("X-Content-Type-Options", "nosniff") return resp # also: configure app so session cookies are HttpOnly + Secure + SameSite app.config.update(SESSION_COOKIE_HTTPONLY=True, SESSION_COOKIE_SECURE=True, SESSION_COOKIE_SAMESITE="Lax") return app