PY-L8-12 · Why Hashing Matters: Password Storage

Learning Goals

3 min

By the end of this lesson you can:

Explain why plain text and plain fast-hashes both fail for passwords.
Describe rainbow-table attacks and how a salt defeats them.
State why a password hash must be slow and tunable.
Verify a password in constant time to avoid timing leaks.

Warm-Up · The Breach That Teaches Everyone

5 min

Databases get stolen. It's not if but when. The question that decides whether it's a disaster or a non-event: what was in the password column?

Stored as...              Attacker who steals the DB gets...
plain text                EVERY password instantly. Catastrophe.
MD5 / SHA-256 (unsalted)  cracks most in minutes via rainbow tables.
SHA-256 + salt            better, but fast hashes still fall to GPUs.
bcrypt/argon2 + salt      cracking is slow & expensive → most survive.

Today's big idea

You never store the password — you store something derived from it that lets you check a login but can't be reversed. Three properties make that derivation safe: it's salted (unique per user, defeats precomputation), slow (so guessing is expensive), and verified in constant time (so the comparison leaks nothing). Miss any one and you're the next headline.

New Concept · Salt, Slow, Constant-Time

14 min

Why plain text is indefensible

If you store password = "hunter2", then anyone who reads the database — a hacker, a rogue employee, a leaked backup — has every user's actual password. And since people reuse passwords, you've also compromised their email, bank, everything. There is never a reason to store a recoverable password.

Why plain hashing isn't enough either

import hashlib
# tempting but WRONG:
stored = hashlib.sha256("hunter2".encode()).hexdigest()
# Problem 1: identical passwords → identical hashes (visible in the DB)
# Problem 2: precomputed "rainbow tables" map common-password hashes → instant crack
# Problem 3: SHA-256 is FAST — a GPU tries billions of guesses/second

Attackers don't reverse the hash; they guess: hash a dictionary of common passwords and compare. Because the hash is deterministic and fast, and everyone's "password123" hashes the same, most fall quickly. Rainbow tables make it instant.

Defence 1: Salt — a unique random value per password

import hashlib, os
# a salt makes each hash unique, even for identical passwords
salt = os.urandom(16)                     # 16 random bytes, unique per user
hashed = hashlib.sha256(salt + "hunter2".encode()).hexdigest()
# store BOTH the salt and the hash. The salt isn't secret — it's anti-precomputation.

Two users with the same password now have different hashes (the salt differs).
Rainbow tables are useless — they'd need a separate table per salt.
The salt is stored alongside the hash; it doesn't need to be secret, just unique and random.

Defence 2: Slow — make each guess expensive

The counter-intuitive rule: password hashing should be SLOW

For files you want hashing fast. For passwords you want it deliberately slow — because the attacker has to run it for every guess. A purpose-built password hash (bcrypt, scrypt, argon2) has a tunable "work factor": set it so one hash takes ~250ms. A legitimate login barely notices; an attacker doing billions of guesses is throttled to a crawl. As hardware improves, you raise the work factor.

Defence 3: Constant-time comparison

import hmac
# WRONG: == returns early on the first differing byte → timing leak
if stored == provided_hash:  ...

# RIGHT: compares in constant time regardless of where they differ
if hmac.compare_digest(stored, provided_hash):  ...

A regular == on bytes can return faster when strings differ early — measurable over many tries, leaking how much of a guess is correct (a timing attack). hmac.compare_digest always takes the same time. (bcrypt's own checkpw already does this for you — next lesson.)

The takeaway hierarchy

plain text          ✗✗✗  never
fast hash, no salt  ✗✗   rainbow-tabled in seconds
fast hash + salt    ✗    GPU-cracked (still too fast)
SLOW hash + salt    ✓    bcrypt / argon2 — the only acceptable answer
+ constant-time check ✓  no timing leaks on verify

Worked Example · Demonstrate Why Salt & Speed Matter

12 min

Goal: a safe, local demonstration (against passwords you invent) of how unsalted fast hashes fall and how salting + slowness defend — making the principles undeniable. No real credentials anywhere.

import hashlib, os, time

# --- demo 1: unsalted fast hashes are crackable by a dictionary ---
def naive_store(password: str) -> str:
    return hashlib.sha256(password.encode()).hexdigest()   # WRONG approach

# an attacker's offline dictionary attack (against MY OWN demo data):
COMMON = ["123456", "password", "qwerty", "hunter2", "letmein"]

def crack(stolen_hash: str) -> str | None:
    for guess in COMMON:                       # in reality, millions of words
        if hashlib.sha256(guess.encode()).hexdigest() == stolen_hash:
            return guess
    return None

stolen = naive_store("hunter2")
print("unsalted SHA-256 cracked →", crack(stolen))   # found instantly

# --- demo 2: salt makes identical passwords look different ---
def salted_store(password: str):
    salt = os.urandom(16)
    h = hashlib.sha256(salt + password.encode()).hexdigest()
    return salt.hex(), h

s1 = salted_store("hunter2")
s2 = salted_store("hunter2")
print("same password, different stored hashes:", s1[1] != s2[1])  # True

# --- demo 3: 'slow' is the real defence (illustrative timing) ---
def slow_hash(password: str, salt: bytes, rounds: int = 200_000) -> str:
    # PBKDF2 = repeat the hash many times to make each guess expensive
    return hashlib.pbkdf2_hmac("sha256", password.encode(), salt, rounds).hex()

t = time.perf_counter()
slow_hash("hunter2", os.urandom(16))
print(f"one slow hash took {time.perf_counter()-t:.3f}s "
      f"→ a billion guesses now takes a LONG time")

unsalted SHA-256 cracked → hunter2
same password, different stored hashes: True
one slow hash took 0.085s → a billion guesses now takes a LONG time

Read the code

Three demos prove the three defences, all against passwords you invented (never anyone's real ones). Demo 1 cracks an unsalted SHA-256 instantly with a tiny dictionary — that's real attacks at small scale. Demo 2 shows salt making identical passwords store differently, killing rainbow tables. Demo 3 uses PBKDF2 (a slow hash via many rounds) so each guess costs ~85ms instead of nanoseconds — multiply by billions of guesses and cracking becomes infeasible. Don't roll your own for production — use bcrypt/argon2 (next lesson). This is just to make why undeniable.

Try It Yourself

13 min

All exercises use passwords you make up — never real credentials, yours or anyone's.

01 🟢 Salt collision demo

Hash the same made-up password three times with a fresh salt each time, and three times without a salt. Confirm the unsalted hashes are identical and the salted ones all differ.

02 🟡 Tune the work factor

Time PBKDF2 at 10k, 100k, and 500k rounds. Find the rounds that take ~250ms on your machine — that's a reasonable work factor. Explain why you'd increase it over the years.

Hint

import hashlib, os, time
for rounds in (10_000, 100_000, 500_000):
    salt = os.urandom(16); t = time.perf_counter()
    hashlib.pbkdf2_hmac("sha256", b"demo", salt, rounds)
    print(f"{rounds:>7} rounds: {time.perf_counter()-t:.3f}s")
# Raise rounds as CPUs get faster, to keep each guess ~costly.

03 🔴 Constant-time check

Write a verify function using hmac.compare_digest. Explain in comments the timing attack that == would enable and why constant-time comparison closes it.

Hint

import hmac, hashlib, os

def make(pw, rounds=200_000):
    salt = os.urandom(16)
    h = hashlib.pbkdf2_hmac("sha256", pw.encode(), salt, rounds)
    return salt, h

def verify(pw, salt, expected, rounds=200_000):
    h = hashlib.pbkdf2_hmac("sha256", pw.encode(), salt, rounds)
    return hmac.compare_digest(h, expected)   # constant time
# '==' on bytes can return early at the first mismatch; timing the
# response over many tries leaks how many leading bytes were right.

Mini-Challenge · A (Correct) Mini Auth Store

8 min

Build a tiny local user store with register(user, password) and login(user, password) that stores only salt + slow-hash (PBKDF2), never the password, and verifies in constant time. Test that two users with the same password get different stored values and that login succeeds only with the right password. (Next lesson swaps PBKDF2 for bcrypt — the production choice.)

Show a sample solution

import hashlib, os, hmac, json
from pathlib import Path

DB = Path("users.json")
ROUNDS = 200_000

def _load(): return json.loads(DB.read_text()) if DB.exists() else {}
def _save(d): DB.write_text(json.dumps(d))

def register(user: str, password: str) -> None:
    users = _load()
    if user in users:
        raise ValueError("user exists")
    salt = os.urandom(16)
    h = hashlib.pbkdf2_hmac("sha256", password.encode(), salt, ROUNDS)
    users[user] = {"salt": salt.hex(), "hash": h.hex()}   # NO plaintext
    _save(users)

def login(user: str, password: str) -> bool:
    users = _load()
    rec = users.get(user)
    if not rec:
        # still do work to avoid leaking whether the user exists (timing)
        hashlib.pbkdf2_hmac("sha256", password.encode(), os.urandom(16), ROUNDS)
        return False
    salt = bytes.fromhex(rec["salt"])
    h = hashlib.pbkdf2_hmac("sha256", password.encode(), salt, ROUNDS)
    return hmac.compare_digest(h.hex(), rec["hash"])

register("aisha", "demo-pw-1"); register("ben", "demo-pw-1")
print("different stored hashes:", _load()["aisha"] != _load()["ben"])  # True
print("right pw:", login("aisha", "demo-pw-1"))   # True
print("wrong pw:", login("aisha", "nope"))        # False

Non-negotiables: salt + slow hash, no plaintext stored, constant-time verify, identical passwords store differently, and constant work even for unknown users.

Recap

3 min

Never store passwords recoverably. Plain text hands an attacker everything; plain fast hashes fall to rainbow tables and GPUs. The correct recipe has three parts: salt (unique random per password — defeats precomputation and hides identical passwords), slow (a tunable work factor so each guess is expensive — the opposite of file hashing), and constant-time verification (hmac.compare_digest, to avoid timing leaks). We demonstrated each with PBKDF2 against invented passwords — but for production, don't roll your own: use bcrypt or argon2, which is exactly the next lesson.

Vocabulary Card

salt: A unique random value added per password to defeat precomputation.
rainbow table: A precomputed map of hashes → passwords; salting makes it useless.
work factor: A tunable cost making each password hash deliberately slow.
timing attack: Inferring secrets from how long a comparison takes; constant-time defeats it.

Homework

4 min

Build the correct mini auth store and prove all four properties (no plaintext, salted, slow, constant-time). Then write a one-page explainer, for a non-technical manager, of why "we hash passwords" isn't enough and what salting + slow hashing add — using the breach framing ("when the DB is stolen, what does the attacker get?").

Sample · manager explainer

"We hash passwords" — manager explainer

Assume our database WILL be stolen one day. The only question is
what the thief gets from the password column.

- Plain text: every customer's password, instantly. They reuse
  passwords, so we've also handed over their email and bank logins.
- Just hashing (SHA-256): attackers don't reverse it — they guess.
  They hash millions of common passwords and match. Identical
  passwords look identical in our DB. Most crack in minutes.
- + SALT (unique random per user): identical passwords now look
  different, and precomputed "rainbow tables" become useless.
- + SLOW hashing (bcrypt): each guess takes ~0.25s instead of
  nanoseconds, so trying billions becomes years of compute. Real
  logins don't notice; attackers are stopped.

Bottom line: salt + slow hashing turns a catastrophic breach into a
manageable one. It's a few lines of code and an industry standard.

Non-negotiables: working four-property store, and a clear breach-framed explanation a non-technical reader understands.

Stored as... Attacker who steals the DB gets... plain text EVERY password instantly. Catastrophe. MD5 / SHA-256 (unsalted) cracks most in minutes via rainbow tables. SHA-256 + salt better, but fast hashes still fall to GPUs. bcrypt/argon2 + salt cracking is slow & expensive → most survive.

import hashlib # tempting but WRONG: stored = hashlib.sha256("hunter2".encode()).hexdigest() # Problem 1: identical passwords → identical hashes (visible in the DB) # Problem 2: precomputed "rainbow tables" map common-password hashes → instant crack # Problem 3: SHA-256 is FAST — a GPU tries billions of guesses/second

import hashlib, os # a salt makes each hash unique, even for identical passwords salt = os.urandom(16) # 16 random bytes, unique per user hashed = hashlib.sha256(salt + "hunter2".encode()).hexdigest() # store BOTH the salt and the hash. The salt isn't secret — it's anti-precomputation.

import hmac # WRONG: == returns early on the first differing byte → timing leak if stored == provided_hash: ... # RIGHT: compares in constant time regardless of where they differ if hmac.compare_digest(stored, provided_hash): ...

plain text ✗✗✗ never fast hash, no salt ✗✗ rainbow-tabled in seconds fast hash + salt ✗ GPU-cracked (still too fast) SLOW hash + salt ✓ bcrypt / argon2 — the only acceptable answer + constant-time check ✓ no timing leaks on verify

import hashlib, os, time # --- demo 1: unsalted fast hashes are crackable by a dictionary --- def naive_store(password: str) -> str: return hashlib.sha256(password.encode()).hexdigest() # WRONG approach # an attacker's offline dictionary attack (against MY OWN demo data): COMMON = ["123456", "password", "qwerty", "hunter2", "letmein"] def crack(stolen_hash: str) -> str | None: for guess in COMMON: # in reality, millions of words if hashlib.sha256(guess.encode()).hexdigest() == stolen_hash: return guess return None stolen = naive_store("hunter2") print("unsalted SHA-256 cracked →", crack(stolen)) # found instantly # --- demo 2: salt makes identical passwords look different --- def salted_store(password: str): salt = os.urandom(16) h = hashlib.sha256(salt + password.encode()).hexdigest() return salt.hex(), h s1 = salted_store("hunter2") s2 = salted_store("hunter2") print("same password, different stored hashes:", s1[1] != s2[1]) # True # --- demo 3: 'slow' is the real defence (illustrative timing) --- def slow_hash(password: str, salt: bytes, rounds: int = 200_000) -> str: # PBKDF2 = repeat the hash many times to make each guess expensive return hashlib.pbkdf2_hmac("sha256", password.encode(), salt, rounds).hex() t = time.perf_counter() slow_hash("hunter2", os.urandom(16)) print(f"one slow hash took {time.perf_counter()-t:.3f}s " f"→ a billion guesses now takes a LONG time")

import hashlib, os, time for rounds in (10_000, 100_000, 500_000): salt = os.urandom(16); t = time.perf_counter() hashlib.pbkdf2_hmac("sha256", b"demo", salt, rounds) print(f"{rounds:>7} rounds: {time.perf_counter()-t:.3f}s") # Raise rounds as CPUs get faster, to keep each guess ~costly.

import hmac, hashlib, os def make(pw, rounds=200_000): salt = os.urandom(16) h = hashlib.pbkdf2_hmac("sha256", pw.encode(), salt, rounds) return salt, h def verify(pw, salt, expected, rounds=200_000): h = hashlib.pbkdf2_hmac("sha256", pw.encode(), salt, rounds) return hmac.compare_digest(h, expected) # constant time # '==' on bytes can return early at the first mismatch; timing the # response over many tries leaks how many leading bytes were right.

import hashlib, os, hmac, json from pathlib import Path DB = Path("users.json") ROUNDS = 200_000 def _load(): return json.loads(DB.read_text()) if DB.exists() else {} def _save(d): DB.write_text(json.dumps(d)) def register(user: str, password: str) -> None: users = _load() if user in users: raise ValueError("user exists") salt = os.urandom(16) h = hashlib.pbkdf2_hmac("sha256", password.encode(), salt, ROUNDS) users[user] = {"salt": salt.hex(), "hash": h.hex()} # NO plaintext _save(users) def login(user: str, password: str) -> bool: users = _load() rec = users.get(user) if not rec: # still do work to avoid leaking whether the user exists (timing) hashlib.pbkdf2_hmac("sha256", password.encode(), os.urandom(16), ROUNDS) return False salt = bytes.fromhex(rec["salt"]) h = hashlib.pbkdf2_hmac("sha256", password.encode(), salt, ROUNDS) return hmac.compare_digest(h.hex(), rec["hash"]) register("aisha", "demo-pw-1"); register("ben", "demo-pw-1") print("different stored hashes:", _load()["aisha"] != _load()["ben"]) # True print("right pw:", login("aisha", "demo-pw-1")) # True print("wrong pw:", login("aisha", "nope")) # False