Learning Goals
3 minBy the end of this lesson you can:
- Build a baseline of file hashes (Lesson 11) for a directory tree.
- Detect modified, added, and deleted files against the baseline.
- Protect the baseline itself from tampering (sign or store it safely).
- Explain where FIM fits in detection and its limits.
Warm-Up · Did Something Change That Shouldn't?
5 minMany attacks leave a trace in the filesystem: a backdoor added to a web app, a config quietly altered, a system binary swapped for a trojaned one. Most of these files should never change between deployments — so any change is a red flag worth investigating.
File Integrity Monitoring (FIM) records a trusted baseline — the hash of every file in a watched set — then periodically re-hashes and compares. Because SHA-256 has the avalanche effect (Lesson 11), even a one-byte change produces a totally different hash, so tampering can't hide. It's a simple, powerful detective control: it won't prevent a change, but it guarantees you'll know about it.
New Concept · Baseline, Compare, Protect
14 min1. Build a baseline
import hashlib, json from pathlib import Path from datetime import datetime def hash_file(path: Path) -> str: h = hashlib.sha256() with open(path, "rb") as f: for chunk in iter(lambda: f.read(65536), b""): h.update(chunk) return h.hexdigest() def build_baseline(folder: str) -> dict: files = {str(p.relative_to(folder)): hash_file(p) for p in Path(folder).rglob("*") if p.is_file()} return {"folder": folder, "created": datetime.now().isoformat(), "files": files}
The baseline is a map of relative-path → SHA-256, captured at a known-good moment (e.g. right after a clean deployment). This is the "truth" everything is compared against.
2. Compare against the baseline
def check(folder: str, baseline: dict) -> dict: old = baseline["files"] current = {str(p.relative_to(folder)): hash_file(p) for p in Path(folder).rglob("*") if p.is_file()} modified = [f for f in old if f in current and old[f] != current[f]] added = [f for f in current if f not in old] deleted = [f for f in old if f not in current] return {"modified": modified, "added": added, "deleted": deleted, "clean": not (modified or added or deleted)}
Three categories of change, each a different signal: modified (existing file's content changed — a backdoor inserted?), added (a new file appeared — a dropped webshell?), deleted (a file vanished — logs wiped?). All three matter to an investigator.
3. Protect the baseline itself
If the baseline file sits next to the monitored files, an attacker who tampers with a file will simply update the baseline too — and your FIM reports "all clean." Defences: store the baseline off the host (a separate secure server), make it append-only/read-only, and sign it (Lesson 17) so any change to the baseline itself is detectable. The integrity of your integrity-check is paramount.
# sign the baseline so tampering with IT is caught (Lesson 17): import json from cryptography.hazmat.primitives.asymmetric import padding from cryptography.hazmat.primitives import hashes def sign_baseline(baseline: dict, private_key) -> bytes: data = json.dumps(baseline, sort_keys=True).encode() return private_key.sign(data, padding.PSS( mgf=padding.MGF1(hashes.SHA256()), salt_length=padding.PSS.MAX_LENGTH), hashes.SHA256()) # keep the private key OFF the monitored host; verify before trusting the baseline.
Where FIM fits — and its limits
- Strength: detects any content change reliably; great for files that shouldn't change (binaries, config, deployed web apps).
- Limit: it's a detective control, not preventive — it tells you after the change. Pair it with prevention.
- Limit: noisy on files that legitimately change (logs, caches) — exclude those or you'll drown in false positives.
- Run it on a schedule (Level 7) and alert on changes (Lesson 34 / 45).
Worked Example · A FIM CLI
12 minGoal: a fim.py with baseline and check commands, exclusions for files that legitimately change, and a clear change report — the kind of tool you'd schedule against a deployed app.
import argparse, hashlib, json, logging, fnmatch from pathlib import Path from datetime import datetime logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") log = logging.getLogger("fim") EXCLUDE = ["*.log", "*.tmp", "__pycache__/*", "*.pyc", ".git/*"] # legit churn def _included(rel: str) -> bool: return not any(fnmatch.fnmatch(rel, pat) for pat in EXCLUDE) def _hash(path: Path) -> str: h = hashlib.sha256() with open(path, "rb") as f: for chunk in iter(lambda: f.read(65536), b""): h.update(chunk) return h.hexdigest() def _snapshot(folder: str) -> dict: out = {} for p in Path(folder).rglob("*"): if p.is_file(): rel = str(p.relative_to(folder)).replace("\\", "/") if _included(rel): out[rel] = _hash(p) return out def cmd_baseline(a): data = {"folder": a.folder, "created": datetime.now().isoformat(), "files": _snapshot(a.folder)} Path(a.db).write_text(json.dumps(data, indent=2)) log.info("baselined %d files → %s", len(data["files"]), a.db) log.info("⚠️ store %s somewhere the monitored host can't modify", a.db) def cmd_check(a): base = json.loads(Path(a.db).read_text()) old, current = base["files"], _snapshot(a.folder) modified = sorted(f for f in old if f in current and old[f] != current[f]) added = sorted(f for f in current if f not in old) deleted = sorted(f for f in old if f not in current) if not (modified or added or deleted): log.info("✓ integrity OK — no changes since %s", base["created"]) return for f in modified: log.warning("MODIFIED %s", f) for f in added: log.warning("ADDED %s", f) for f in deleted: log.warning("DELETED %s", f) log.warning("%d change(s) — investigate.", len(modified)+len(added)+len(deleted)) if __name__ == "__main__": p = argparse.ArgumentParser(description="File Integrity Monitor.") sub = p.add_subparsers(dest="cmd", required=True) for name, fn in [("baseline", cmd_baseline), ("check", cmd_check)]: sp = sub.add_parser(name); sp.add_argument("folder") sp.add_argument("--db", default="fim_baseline.json"); sp.set_defaults(func=fn) args = p.parse_args(); args.func(args)
$ python fim.py baseline /var/www/app INFO baselined 214 files → fim_baseline.json INFO ⚠️ store fim_baseline.json somewhere the monitored host can't modify # ...an attacker drops a webshell and edits index.php... $ python fim.py check /var/www/app WARNING MODIFIED index.php WARNING ADDED uploads/shell.php WARNING 2 change(s) — investigate.
Read the result
The monitor caught both a modified file (the altered index.php) and an added one (the dropped shell.php) — exactly the footprints of a web compromise. SHA-256's avalanche effect means even a one-character backdoor edit changes the hash, so nothing slips through. The EXCLUDE patterns keep legitimately-changing files (logs, caches) from flooding the report with noise. The honest caveat is in the baseline message: store the baseline where the monitored host can't reach it, or an attacker rewrites your source of truth. Schedule check (Level 7) + alert (Lesson 34) and you have continuous tamper detection.
Try It Yourself
13 minBaseline a test folder, then modify one file, add one, and delete one. Run check and confirm all three changes are reported in the right categories.
Add a log file to your watched folder and confirm it would normally flag every run. Add it to EXCLUDE and confirm the noise disappears while real changes still show. Explain the false-positive trade-off.
Sign the baseline file with an RSA key (Lesson 17) and verify the signature before trusting it in check. Demonstrate that editing the baseline (as an attacker would) makes verification fail, so the FIM refuses a tampered baseline.
Hint
# in cmd_check: load baseline + its .sig, verify with the public key # BEFORE comparing. If verification fails: # log.error("baseline signature invalid — it may have been tampered!") # return # This protects the integrity of the integrity-checker itself.
Mini-Challenge · Continuous FIM with Alerts
8 minCombine with Level 7: run check on a schedule (or via watchdog, Lesson 37) and, when changes are detected, fire an alert (Lesson 34's notifier) with the list of changed files and a severity (added/modified in a sensitive path = critical). This is a real intrusion-detection capability.
Show the integration sketch
import schedule, time, json from pathlib import Path SENSITIVE = ("index.php", ".env", "wp-config.php", "/bin/") def monitored_check(folder, db): base = json.loads(Path(db).read_text()) old, cur = base["files"], _snapshot(folder) changes = ([("MODIFIED", f) for f in old if f in cur and old[f] != cur[f]] + [("ADDED", f) for f in cur if f not in old] + [("DELETED", f) for f in old if f not in cur]) if changes: critical = any(any(s in f for s in SENSITIVE) for _, f in changes) level = "critical" if critical else "warning" msg = "FIM changes: " + ", ".join(f"{k} {f}" for k, f in changes[:10]) alert(msg, level) # Lesson 34 notifier schedule.every(10).minutes.do(monitored_check, "/var/www/app", "fim_baseline.json") while True: schedule.run_pending(); time.sleep(1)
Non-negotiables: scheduled/triggered checks, an alert on change, and a severity bump for sensitive paths.
Recap
3 minFile Integrity Monitoring records a trusted baseline of SHA-256 hashes, then re-hashes and compares to detect modified, added, and deleted files — catching tampering down to a single byte (avalanche effect). Exclude files that legitimately change to avoid false-positive noise. Crucially, protect the baseline: store it off-host and/or sign it (Lesson 17), or an attacker just rewrites your source of truth. FIM is a detective control — it tells you after a change — so schedule it (Level 7) and alert on findings (Lesson 34). It's the engine behind Tripwire/AIDE and a core part of intrusion detection.
Vocabulary Card
- file integrity monitoring
- Detecting unauthorized file changes by comparing hashes to a baseline.
- baseline
- The trusted, known-good snapshot of file hashes.
- detective control
- A measure that detects (rather than prevents) a problem.
- false positive
- A flagged "change" that's actually legitimate (e.g. a log file) — exclude it.
Homework
4 minBuild the full FIM CLI with exclusions and a signed baseline. Run it against a copy of a small project: baseline it, simulate an attack (modify + add + delete files), and produce the change report. Wire it to a scheduled check with an alert on change. Write a note: how you protect the baseline, and FIM's one key limitation with how you compensate.
Sample · FIM notes
Protecting the baseline: I sign fim_baseline.json with an RSA key whose PRIVATE key lives off the monitored host. 'check' verifies the signature first; if the baseline was edited, verification fails and the tool refuses to run (so an attacker can't silence it by rewriting the baseline). I also store a copy on a separate, read-only server. Key limitation: FIM is DETECTIVE — it tells me a file changed AFTER it happened, not before. So I compensate with: (1) prevention elsewhere (least privilege, read-only deploys), and (2) fast detection — scheduled checks every 10 min + a critical alert when a sensitive path (index.php, .env) changes, so the window between compromise and response is small. Demo: modified index.php, added uploads/shell.php, deleted a config → all three flagged in the right categories; alert fired 'critical'.
Non-negotiables: working baseline/check with exclusions + signed baseline, a simulated-attack report, scheduled+alerting integration, and the protect-the-baseline + detective-limit notes.