Learning Goals
3 minBy the end of this lesson you can:
- Parse log lines into structured records (regex + your Level 7 skills).
- Detect brute-force attempts (many failures from one source).
- Spot anomalies: odd hours, new IPs, impossible travel, rare events.
- Correlate events across time/sources to reconstruct an attack.
Warm-Up · The Attack Is in the Logs
5 minAfter almost every breach, investigators find the attack was already in the logs — nobody was looking. A brute-force login shows as dozens of failures from one IP; a compromise shows as a successful login from a never-seen location at 3 a.m. The evidence exists; the skill is surfacing it.
Log analysis turns raw text into security signal in three moves: parse lines into structured fields, detect known-bad patterns (brute force, errors spikes), and correlate events to tell a story (failures → success → data access). Your Level 7 skills (regex, Counter, datetime, file reading) are exactly the tools — now pointed at finding attackers.
New Concept · Parse, Detect, Correlate
14 min1. Parse log lines into records
import re from datetime import datetime # example auth log line: # 2026-05-28 03:14:07 sshd: Failed password for admin from 203.0.113.9 port 52344 LINE = re.compile( r"(?P<ts>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) " r"sshd: (?P<result>Failed|Accepted) password for (?P<user>\S+) " r"from (?P<ip>\d+\.\d+\.\d+\.\d+)") def parse(line: str) -> dict | None: m = LINE.search(line) if not m: return None return {"ts": datetime.strptime(m["ts"], "%Y-%m-%d %H:%M:%S"), "result": m["result"], "user": m["user"], "ip": m["ip"]}
A regex extracts the fields that matter — timestamp, outcome, user, source IP — turning a wall of text into queryable records (the JSON/CSV mindset from Level 7, applied to logs).
2. Detect: brute-force (the classic)
from collections import defaultdict def find_bruteforce(records, threshold=5, window_seconds=60): """Flag IPs with many failed logins in a short window.""" fails = defaultdict(list) for r in records: if r["result"] == "Failed": fails[r["ip"]].append(r["ts"]) flagged = {} for ip, times in fails.items(): times.sort() # sliding window: any 'threshold' fails within 'window_seconds'? for i in range(len(times) - threshold + 1): if (times[i + threshold - 1] - times[i]).total_seconds() <= window_seconds: flagged[ip] = len(times) break return flagged
Brute force = many failures from one source fast. A sliding window over sorted timestamps catches "5 failures in 60 seconds" — the signature of an automated password-guessing attack. (This is also what tools like fail2ban detect to auto-ban IPs.)
3. Detect: anomalies
odd hours a successful login at 3am for a 9-5 user new source first-ever login from a brand-new IP/country rare event something that almost never happens (a privilege change) impossible travel login from KL, then London 10 minutes later volume spike 100x the normal error rate in a minute
Anomalies need a sense of normal (a baseline, like Lesson 23). "Successful login from a new IP at an unusual hour" combines several weak signals into a strong one.
4. Correlate: tell the story
def correlate_compromise(records, ip): """Did failures from an IP turn into a SUCCESS? That's a likely breach.""" events = sorted([r for r in records if r["ip"] == ip], key=lambda r: r["ts"]) fails = [e for e in events if e["result"] == "Failed"] success = next((e for e in events if e["result"] == "Accepted"), None) if fails and success and success["ts"] > fails[0]["ts"]: return (f"⚠️ {ip}: {len(fails)} failures THEN a successful login as " f"'{success['user']}' at {success['ts']} — likely cracked.") return None
The most damning pattern: many failures followed by a success from the same IP — the brute force worked. Correlation across events (and across log sources — auth + web + firewall) reconstructs the attack timeline, the heart of incident response.
Worked Example · A Log Triage Tool
12 minGoal: read an auth log, surface brute-force IPs, off-hours successful logins, and likely-cracked accounts — turning thousands of lines into a short, actionable report.
import re, logging from collections import defaultdict from datetime import datetime from pathlib import Path logging.basicConfig(level=logging.INFO, format="%(message)s") log = logging.getLogger("logtriage") LINE = re.compile( r"(?P<ts>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) sshd: " r"(?P<result>Failed|Accepted) password for (?P<user>\S+) " r"from (?P<ip>\d+\.\d+\.\d+\.\d+)") def parse_log(path: str) -> list[dict]: records = [] for line in Path(path).read_text(encoding="utf-8").splitlines(): m = LINE.search(line) if m: records.append({"ts": datetime.strptime(m["ts"], "%Y-%m-%d %H:%M:%S"), "result": m["result"], "user": m["user"], "ip": m["ip"]}) return records def triage(path: str) -> None: records = parse_log(path) log.info("parsed %d auth events\n", len(records)) # 1. brute-force: >=5 failures in 60s from one IP fails = defaultdict(list) for r in records: if r["result"] == "Failed": fails[r["ip"]].append(r["ts"]) bruteforce = [] for ip, times in fails.items(): times.sort() for i in range(len(times) - 4): if (times[i+4] - times[i]).total_seconds() <= 60: bruteforce.append((ip, len(times))); break for ip, n in sorted(bruteforce, key=lambda x: -x[1]): log.info("🔴 BRUTE FORCE %s — %d failed logins", ip, n) # 2. off-hours successful logins (00:00-05:00) for r in records: if r["result"] == "Accepted" and r["ts"].hour < 5: log.info("🟡 OFF-HOURS %s as '%s' at %s", r["ip"], r["user"], r["ts"].strftime("%H:%M")) # 3. correlation: failures then success from same IP = likely cracked by_ip = defaultdict(list) for r in records: by_ip[r["ip"]].append(r) for ip, events in by_ip.items(): events.sort(key=lambda e: e["ts"]) has_fail = any(e["result"] == "Failed" for e in events) success = next((e for e in events if e["result"] == "Accepted"), None) if has_fail and success: log.info("🔴 LIKELY CRACKED %s → success as '%s'", ip, success["user"]) triage("auth.log")
parsed 10342 auth events 🔴 BRUTE FORCE 203.0.113.9 — 1184 failed logins 🟡 OFF-HOURS 203.0.113.9 as 'admin' at 03:14 🔴 LIKELY CRACKED 203.0.113.9 → success as 'admin'
Read the code
Ten thousand lines collapse to three findings that tell a complete story: one IP made over a thousand failed attempts (brute force), then logged in successfully as admin (likely cracked), at 3 a.m. (off-hours) — a textbook compromise, surfaced automatically. Each detection reuses Level 7 fundamentals (regex parse, defaultdict grouping, datetime comparisons) pointed at security. This is exactly what a SIEM does at scale, and it's the engine behind the Intrusion Detective challenge next lesson.
Try It Yourself
13 minMake a small synthetic auth.log (or use your own system's, which you're authorised to read) with a mix of normal logins and an obvious brute-force burst.
Parse the log and report: total events, failures vs. successes, and the top 5 source IPs by event count. Use Counter.
Implement the sliding-window brute-force detector and tune the threshold/window. Confirm it flags the burst but not a user who simply mistyped twice.
Hint
# normal: 2 failures over 5 minutes → NOT flagged # attack: 50 failures in 30 seconds → flagged # tune (threshold, window) so legit fat-finger users aren't caught.
From the log, learn each user's usual login hours, then flag a successful login well outside that range as anomalous. This is baseline-driven detection (Lesson 23's idea applied to logs).
Hint
from collections import defaultdict hours = defaultdict(set) for r in records: if r["result"] == "Accepted": hours[r["user"]].add(r["ts"].hour) # then: flag a success whose hour isn't in that user's usual set
Mini-Challenge · Multi-Source Correlation
8 minReal attacks span multiple logs. Given an auth log and a web-access log, correlate by IP and time to reconstruct a kill chain: a brute-force in auth, then suspicious requests (e.g. to /admin or shell.php) in the web log from the same IP shortly after. Output an ordered timeline.
Show a sample solution
def build_timeline(auth_records, web_records, ip): events = [] for r in auth_records: if r["ip"] == ip: events.append((r["ts"], "AUTH", f"{r['result']} login as {r['user']}")) for r in web_records: if r["ip"] == ip: events.append((r["ts"], "WEB", f"{r['method']} {r['path']} → {r['status']}")) events.sort() print(f"Attack timeline for {ip}:") for ts, source, detail in events: print(f" {ts:%H:%M:%S} [{source}] {detail}") # e.g.: many AUTH Failed → AUTH Accepted → WEB GET /admin → # WEB POST /upload shell.php = full intrusion story
Non-negotiables: correlate two log sources by IP, build a time-ordered timeline, and show the failures→success→malicious-request chain.
Recap
3 minSecurity log analysis is parse → detect → correlate. Parse lines into structured records with regex (your Level 7 toolkit). Detect known-bad patterns — brute force (many failures from one source in a window, the fail2ban signature) — and anomalies against a baseline (off-hours, new IPs, impossible travel, rare events). Correlate events across time and sources to reconstruct the attack story (failures → success → malicious requests). The attack is almost always already in the logs; the value is surfacing it. This is what a SIEM automates — and the foundation of the next lesson's challenge.
Vocabulary Card
- log analysis
- Turning raw log text into security signal: parse, detect, correlate.
- brute force
- Many rapid login attempts from one source — detected by windowed failure counts.
- correlation
- Linking events across time/sources to reconstruct an attack.
- SIEM
- Security Information & Event Management — tooling that does this at scale.
Homework
4 minBuild a log-triage tool that parses an auth log and reports brute-force IPs, off-hours successes, and likely-cracked accounts. Generate (or obtain, with authorization) a log containing an embedded attack and confirm your tool finds it. Bonus: correlate with a second log source for a timeline. Write a short incident summary: what happened, when, from where, and which account was compromised.
Sample · incident summary from logs
INCIDENT SUMMARY (from auth.log + access.log) What: SSH brute-force leading to account compromise and webshell. When: 2026-05-28, 03:09–03:21 (off-hours). Where: source 203.0.113.9 (single IP). Timeline (correlated): 03:09 1,184 Failed SSH logins for 'admin' from 203.0.113.9 03:14 Accepted password for 'admin' ← brute force succeeded 03:16 [web] GET /admin 200 (from same IP) 03:18 [web] POST /uploads/shell.php 200 ← webshell dropped Compromised account: admin. Recommended actions: disable 'admin' password auth (keys+MFA), ban the IP, remove shell.php, rotate creds, and review what the session touched. (Ties to FIM L8-21 which would also flag shell.php.)
Non-negotiables: working triage detecting an embedded attack, and an incident summary with what/when/where/which-account, ideally multi-source.