PY-L8-21 · File Integrity Monitoring with Checksums

Learning Goals

3 min

By the end of this lesson you can:

Build a baseline of file hashes (Lesson 11) for a directory tree.
Detect modified, added, and deleted files against the baseline.
Protect the baseline itself from tampering (sign or store it safely).
Explain where FIM fits in detection and its limits.

Warm-Up · Did Something Change That Shouldn't?

5 min

Many attacks leave a trace in the filesystem: a backdoor added to a web app, a config quietly altered, a system binary swapped for a trojaned one. Most of these files should never change between deployments — so any change is a red flag worth investigating.

Today's big idea

File Integrity Monitoring (FIM) records a trusted baseline — the hash of every file in a watched set — then periodically re-hashes and compares. Because SHA-256 has the avalanche effect (Lesson 11), even a one-byte change produces a totally different hash, so tampering can't hide. It's a simple, powerful detective control: it won't prevent a change, but it guarantees you'll know about it.

New Concept · Baseline, Compare, Protect

14 min

1. Build a baseline

import hashlib, json
from pathlib import Path
from datetime import datetime

def hash_file(path: Path) -> str:
    h = hashlib.sha256()
    with open(path, "rb") as f:
        for chunk in iter(lambda: f.read(65536), b""):
            h.update(chunk)
    return h.hexdigest()

def build_baseline(folder: str) -> dict:
    files = {str(p.relative_to(folder)): hash_file(p)
             for p in Path(folder).rglob("*") if p.is_file()}
    return {"folder": folder, "created": datetime.now().isoformat(),
            "files": files}

The baseline is a map of relative-path → SHA-256, captured at a known-good moment (e.g. right after a clean deployment). This is the "truth" everything is compared against.

2. Compare against the baseline

def check(folder: str, baseline: dict) -> dict:
    old = baseline["files"]
    current = {str(p.relative_to(folder)): hash_file(p)
               for p in Path(folder).rglob("*") if p.is_file()}

    modified = [f for f in old if f in current and old[f] != current[f]]
    added    = [f for f in current if f not in old]
    deleted  = [f for f in old if f not in current]
    return {"modified": modified, "added": added, "deleted": deleted,
            "clean": not (modified or added or deleted)}

Three categories of change, each a different signal: modified (existing file's content changed — a backdoor inserted?), added (a new file appeared — a dropped webshell?), deleted (a file vanished — logs wiped?). All three matter to an investigator.

3. Protect the baseline itself

⚠️ A baseline an attacker can edit is worthless

If the baseline file sits next to the monitored files, an attacker who tampers with a file will simply update the baseline too — and your FIM reports "all clean." Defences: store the baseline off the host (a separate secure server), make it append-only/read-only, and sign it (Lesson 17) so any change to the baseline itself is detectable. The integrity of your integrity-check is paramount.

# sign the baseline so tampering with IT is caught (Lesson 17):
import json
from cryptography.hazmat.primitives.asymmetric import padding
from cryptography.hazmat.primitives import hashes

def sign_baseline(baseline: dict, private_key) -> bytes:
    data = json.dumps(baseline, sort_keys=True).encode()
    return private_key.sign(data, padding.PSS(
        mgf=padding.MGF1(hashes.SHA256()),
        salt_length=padding.PSS.MAX_LENGTH), hashes.SHA256())
# keep the private key OFF the monitored host; verify before trusting the baseline.

Where FIM fits — and its limits

Strength: detects any content change reliably; great for files that shouldn't change (binaries, config, deployed web apps).
Limit: it's a detective control, not preventive — it tells you after the change. Pair it with prevention.
Limit: noisy on files that legitimately change (logs, caches) — exclude those or you'll drown in false positives.
Run it on a schedule (Level 7) and alert on changes (Lesson 34 / 45).

Worked Example · A FIM CLI

12 min

Goal: a fim.py with baseline and check commands, exclusions for files that legitimately change, and a clear change report — the kind of tool you'd schedule against a deployed app.

import argparse, hashlib, json, logging, fnmatch
from pathlib import Path
from datetime import datetime

logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s")
log = logging.getLogger("fim")

EXCLUDE = ["*.log", "*.tmp", "__pycache__/*", "*.pyc", ".git/*"]  # legit churn

def _included(rel: str) -> bool:
    return not any(fnmatch.fnmatch(rel, pat) for pat in EXCLUDE)

def _hash(path: Path) -> str:
    h = hashlib.sha256()
    with open(path, "rb") as f:
        for chunk in iter(lambda: f.read(65536), b""):
            h.update(chunk)
    return h.hexdigest()

def _snapshot(folder: str) -> dict:
    out = {}
    for p in Path(folder).rglob("*"):
        if p.is_file():
            rel = str(p.relative_to(folder)).replace("\\", "/")
            if _included(rel):
                out[rel] = _hash(p)
    return out

def cmd_baseline(a):
    data = {"folder": a.folder, "created": datetime.now().isoformat(),
            "files": _snapshot(a.folder)}
    Path(a.db).write_text(json.dumps(data, indent=2))
    log.info("baselined %d files → %s", len(data["files"]), a.db)
    log.info("⚠️ store %s somewhere the monitored host can't modify", a.db)

def cmd_check(a):
    base = json.loads(Path(a.db).read_text())
    old, current = base["files"], _snapshot(a.folder)
    modified = sorted(f for f in old if f in current and old[f] != current[f])
    added    = sorted(f for f in current if f not in old)
    deleted  = sorted(f for f in old if f not in current)

    if not (modified or added or deleted):
        log.info("✓ integrity OK — no changes since %s", base["created"])
        return
    for f in modified: log.warning("MODIFIED  %s", f)
    for f in added:    log.warning("ADDED     %s", f)
    for f in deleted:  log.warning("DELETED   %s", f)
    log.warning("%d change(s) — investigate.", len(modified)+len(added)+len(deleted))

if __name__ == "__main__":
    p = argparse.ArgumentParser(description="File Integrity Monitor.")
    sub = p.add_subparsers(dest="cmd", required=True)
    for name, fn in [("baseline", cmd_baseline), ("check", cmd_check)]:
        sp = sub.add_parser(name); sp.add_argument("folder")
        sp.add_argument("--db", default="fim_baseline.json"); sp.set_defaults(func=fn)
    args = p.parse_args(); args.func(args)

$ python fim.py baseline /var/www/app
INFO baselined 214 files → fim_baseline.json
INFO ⚠️ store fim_baseline.json somewhere the monitored host can't modify

# ...an attacker drops a webshell and edits index.php...
$ python fim.py check /var/www/app
WARNING MODIFIED  index.php
WARNING ADDED     uploads/shell.php
WARNING 2 change(s) — investigate.

Read the result

The monitor caught both a modified file (the altered index.php) and an added one (the dropped shell.php) — exactly the footprints of a web compromise. SHA-256's avalanche effect means even a one-character backdoor edit changes the hash, so nothing slips through. The EXCLUDE patterns keep legitimately-changing files (logs, caches) from flooding the report with noise. The honest caveat is in the baseline message: store the baseline where the monitored host can't reach it, or an attacker rewrites your source of truth. Schedule check (Level 7) + alert (Lesson 34) and you have continuous tamper detection.

Try It Yourself

13 min

01 🟢 Baseline & detect

Baseline a test folder, then modify one file, add one, and delete one. Run check and confirm all three changes are reported in the right categories.

02 🟡 Tame the noise

Add a log file to your watched folder and confirm it would normally flag every run. Add it to EXCLUDE and confirm the noise disappears while real changes still show. Explain the false-positive trade-off.

03 🔴 Sign the baseline

Sign the baseline file with an RSA key (Lesson 17) and verify the signature before trusting it in check. Demonstrate that editing the baseline (as an attacker would) makes verification fail, so the FIM refuses a tampered baseline.

Hint

# in cmd_check: load baseline + its .sig, verify with the public key
# BEFORE comparing. If verification fails:
#   log.error("baseline signature invalid — it may have been tampered!")
#   return
# This protects the integrity of the integrity-checker itself.

Mini-Challenge · Continuous FIM with Alerts

8 min

Combine with Level 7: run check on a schedule (or via watchdog, Lesson 37) and, when changes are detected, fire an alert (Lesson 34's notifier) with the list of changed files and a severity (added/modified in a sensitive path = critical). This is a real intrusion-detection capability.

Show the integration sketch

import schedule, time, json
from pathlib import Path

SENSITIVE = ("index.php", ".env", "wp-config.php", "/bin/")

def monitored_check(folder, db):
    base = json.loads(Path(db).read_text())
    old, cur = base["files"], _snapshot(folder)
    changes = ([("MODIFIED", f) for f in old if f in cur and old[f] != cur[f]]
               + [("ADDED", f) for f in cur if f not in old]
               + [("DELETED", f) for f in old if f not in cur])
    if changes:
        critical = any(any(s in f for s in SENSITIVE) for _, f in changes)
        level = "critical" if critical else "warning"
        msg = "FIM changes: " + ", ".join(f"{k} {f}" for k, f in changes[:10])
        alert(msg, level)          # Lesson 34 notifier

schedule.every(10).minutes.do(monitored_check, "/var/www/app", "fim_baseline.json")
while True:
    schedule.run_pending(); time.sleep(1)

Non-negotiables: scheduled/triggered checks, an alert on change, and a severity bump for sensitive paths.

Recap

3 min

File Integrity Monitoring records a trusted baseline of SHA-256 hashes, then re-hashes and compares to detect modified, added, and deleted files — catching tampering down to a single byte (avalanche effect). Exclude files that legitimately change to avoid false-positive noise. Crucially, protect the baseline: store it off-host and/or sign it (Lesson 17), or an attacker just rewrites your source of truth. FIM is a detective control — it tells you after a change — so schedule it (Level 7) and alert on findings (Lesson 34). It's the engine behind Tripwire/AIDE and a core part of intrusion detection.

Vocabulary Card

file integrity monitoring: Detecting unauthorized file changes by comparing hashes to a baseline.
baseline: The trusted, known-good snapshot of file hashes.
detective control: A measure that detects (rather than prevents) a problem.
false positive: A flagged "change" that's actually legitimate (e.g. a log file) — exclude it.

Homework

4 min

Build the full FIM CLI with exclusions and a signed baseline. Run it against a copy of a small project: baseline it, simulate an attack (modify + add + delete files), and produce the change report. Wire it to a scheduled check with an alert on change. Write a note: how you protect the baseline, and FIM's one key limitation with how you compensate.

Sample · FIM notes

Protecting the baseline: I sign fim_baseline.json with an RSA key
whose PRIVATE key lives off the monitored host. 'check' verifies the
signature first; if the baseline was edited, verification fails and
the tool refuses to run (so an attacker can't silence it by rewriting
the baseline). I also store a copy on a separate, read-only server.

Key limitation: FIM is DETECTIVE — it tells me a file changed AFTER
it happened, not before. So I compensate with: (1) prevention
elsewhere (least privilege, read-only deploys), and (2) fast
detection — scheduled checks every 10 min + a critical alert when a
sensitive path (index.php, .env) changes, so the window between
compromise and response is small.

Demo: modified index.php, added uploads/shell.php, deleted a config
→ all three flagged in the right categories; alert fired 'critical'.

Non-negotiables: working baseline/check with exclusions + signed baseline, a simulated-attack report, scheduled+alerting integration, and the protect-the-baseline + detective-limit notes.

import hashlib, json from pathlib import Path from datetime import datetime def hash_file(path: Path) -> str: h = hashlib.sha256() with open(path, "rb") as f: for chunk in iter(lambda: f.read(65536), b""): h.update(chunk) return h.hexdigest() def build_baseline(folder: str) -> dict: files = {str(p.relative_to(folder)): hash_file(p) for p in Path(folder).rglob("*") if p.is_file()} return {"folder": folder, "created": datetime.now().isoformat(), "files": files}

def check(folder: str, baseline: dict) -> dict: old = baseline["files"] current = {str(p.relative_to(folder)): hash_file(p) for p in Path(folder).rglob("*") if p.is_file()} modified = [f for f in old if f in current and old[f] != current[f]] added = [f for f in current if f not in old] deleted = [f for f in old if f not in current] return {"modified": modified, "added": added, "deleted": deleted, "clean": not (modified or added or deleted)}

# sign the baseline so tampering with IT is caught (Lesson 17): import json from cryptography.hazmat.primitives.asymmetric import padding from cryptography.hazmat.primitives import hashes def sign_baseline(baseline: dict, private_key) -> bytes: data = json.dumps(baseline, sort_keys=True).encode() return private_key.sign(data, padding.PSS( mgf=padding.MGF1(hashes.SHA256()), salt_length=padding.PSS.MAX_LENGTH), hashes.SHA256()) # keep the private key OFF the monitored host; verify before trusting the baseline.

import argparse, hashlib, json, logging, fnmatch from pathlib import Path from datetime import datetime logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") log = logging.getLogger("fim") EXCLUDE = ["*.log", "*.tmp", "__pycache__/*", "*.pyc", ".git/*"] # legit churn def _included(rel: str) -> bool: return not any(fnmatch.fnmatch(rel, pat) for pat in EXCLUDE) def _hash(path: Path) -> str: h = hashlib.sha256() with open(path, "rb") as f: for chunk in iter(lambda: f.read(65536), b""): h.update(chunk) return h.hexdigest() def _snapshot(folder: str) -> dict: out = {} for p in Path(folder).rglob("*"): if p.is_file(): rel = str(p.relative_to(folder)).replace("\\", "/") if _included(rel): out[rel] = _hash(p) return out def cmd_baseline(a): data = {"folder": a.folder, "created": datetime.now().isoformat(), "files": _snapshot(a.folder)} Path(a.db).write_text(json.dumps(data, indent=2)) log.info("baselined %d files → %s", len(data["files"]), a.db) log.info("⚠️ store %s somewhere the monitored host can't modify", a.db) def cmd_check(a): base = json.loads(Path(a.db).read_text()) old, current = base["files"], _snapshot(a.folder) modified = sorted(f for f in old if f in current and old[f] != current[f]) added = sorted(f for f in current if f not in old) deleted = sorted(f for f in old if f not in current) if not (modified or added or deleted): log.info("✓ integrity OK — no changes since %s", base["created"]) return for f in modified: log.warning("MODIFIED %s", f) for f in added: log.warning("ADDED %s", f) for f in deleted: log.warning("DELETED %s", f) log.warning("%d change(s) — investigate.", len(modified)+len(added)+len(deleted)) if __name__ == "__main__": p = argparse.ArgumentParser(description="File Integrity Monitor.") sub = p.add_subparsers(dest="cmd", required=True) for name, fn in [("baseline", cmd_baseline), ("check", cmd_check)]: sp = sub.add_parser(name); sp.add_argument("folder") sp.add_argument("--db", default="fim_baseline.json"); sp.set_defaults(func=fn) args = p.parse_args(); args.func(args)

$ python fim.py baseline /var/www/app INFO baselined 214 files → fim_baseline.json INFO ⚠️ store fim_baseline.json somewhere the monitored host can't modify # ...an attacker drops a webshell and edits index.php... $ python fim.py check /var/www/app WARNING MODIFIED index.php WARNING ADDED uploads/shell.php WARNING 2 change(s) — investigate.

# in cmd_check: load baseline + its .sig, verify with the public key # BEFORE comparing. If verification fails: # log.error("baseline signature invalid — it may have been tampered!") # return # This protects the integrity of the integrity-checker itself.

import schedule, time, json from pathlib import Path SENSITIVE = ("index.php", ".env", "wp-config.php", "/bin/") def monitored_check(folder, db): base = json.loads(Path(db).read_text()) old, cur = base["files"], _snapshot(folder) changes = ([("MODIFIED", f) for f in old if f in cur and old[f] != cur[f]] + [("ADDED", f) for f in cur if f not in old] + [("DELETED", f) for f in old if f not in cur]) if changes: critical = any(any(s in f for s in SENSITIVE) for _, f in changes) level = "critical" if critical else "warning" msg = "FIM changes: " + ", ".join(f"{k} {f}" for k, f in changes[:10]) alert(msg, level) # Lesson 34 notifier schedule.every(10).minutes.do(monitored_check, "/var/www/app", "fim_baseline.json") while True: schedule.run_pending(); time.sleep(1)