Learning Goals
3 min- Identify five common sources of bias in datasets.
- Spot misleading chart practices and fix them.
- Decide whether a finding should be published, with what caveats.
- Write a one-paragraph "methodology and limitations" section.
Warm-Up · Five Kinds of Bias
5 min- Selection bias — the sample doesn't represent the population (only people with smartphones; only people who responded).
- Survivorship bias — you only see what made it through (successful companies, planes that returned from missions).
- Confirmation bias — you keep looking until the data supports your prior; you stop when it does.
- Measurement bias — instruments and definitions differ across groups (different thermometers, different test difficulty).
- Reporting bias — some events are reported more than others (crime reports, COVID tests).
Every dataset has a story about how it was collected. If you can't explain how, you can't honestly publish what.
Misleading Charts — Four Classics
14 min1. Truncated y-axis
A 0.5% difference looks dramatic if the y-axis starts at 99%. Default to starting bar charts at 0; if you must truncate, mark it with a broken-axis symbol.
2. Cherry-picked time window
Showing only the months where a metric went down (or up) makes a non-trend look like one. Always include the full range you have, or state clearly why you cropped.
3. Dual-axis scales chosen to deceive
Two lines on different scales can be made to fit any narrative. Either annotate the scales clearly or split into two charts.
4. Pie charts with too many slices
A 12-slice pie is unreadable. Use a bar chart; reserve pies for ≤ 5 slices with one clear majority.
The honest-chart checklist
✅ y-axis starts at 0 (or labelled if not) ✅ time range covers all available data ✅ categorical axes are sorted by value (not alphabetical) ✅ scales match across compared groups ✅ axes are labelled with units ✅ source + date appear in the caption ✅ a "this excludes X" note when relevant
Worked Example · Fix the Chart
12 minImagine a colleague drew this chart:
fig, ax = plt.subplots()
ax.bar(["Last year", "This year"], [99.5, 99.9])
ax.set_ylim(99, 100)
ax.set_title("Server uptime — doubled this year!")
plt.show()The numbers are accurate; the chart is dishonest. set_ylim(99, 100) makes a 0.4-percentage-point change look enormous. The title doubled is wrong: 99.5 → 99.9 is a 0.4-point absolute improvement, not 2×.
Honest version
fig, ax = plt.subplots() ax.bar(["Last year", "This year"], [99.5, 99.9], color=["#888", "#4a90e2"]) ax.set_ylim(0, 100) ax.set_title("Server uptime — 0.4 pp improvement YoY") ax.set_ylabel("% uptime") for x, y in zip([0, 1], [99.5, 99.9]): ax.text(x, y + 0.5, f"{y}%", ha="center") plt.show()
Read the diff
Two changes: y-axis 0-100 makes the bars look almost identical (they nearly are); the title now says what changed in plain terms. The honest version is less exciting — and more truthful. That trade-off is the job.
Audit Your Own Work
13 minOpen your Lesson 37 data story. Score each chart against the honest-chart checklist. Mark any failures.
Add a 4-sentence "methodology" section to the README: source, date range, what you cleaned, what you excluded.
List three biases that could be in your dataset. For each, write one sentence on how to mitigate or disclose it.
Mini-Challenge · The "No Story" Result
8 minSometimes the data shows nothing interesting. Practice: write a half-page report saying just that — the question, the analysis, the absence of a finding. Honest "no signal" is more valuable than dressed-up noise.
Recap
3 minFive biases. Four misleading chart tricks. One checklist. Every chart you ship is a small ethical decision; you can either make it clearly or make it dramatic, and only one of those is fair to the reader.
Homework
4 minAudit one published chart from a newspaper, magazine or company blog. List two things they did well and two things you would change. Submit your audit as a markdown file with the screenshot.