PY-L4-45 · Data Ethics & Storytelling

Learning Goals

3 min

Identify five common sources of bias in datasets.
Spot misleading chart practices and fix them.
Decide whether a finding should be published, with what caveats.
Write a one-paragraph "methodology and limitations" section.

Warm-Up · Five Kinds of Bias

5 min

Selection bias — the sample doesn't represent the population (only people with smartphones; only people who responded).
Survivorship bias — you only see what made it through (successful companies, planes that returned from missions).
Confirmation bias — you keep looking until the data supports your prior; you stop when it does.
Measurement bias — instruments and definitions differ across groups (different thermometers, different test difficulty).
Reporting bias — some events are reported more than others (crime reports, COVID tests).

Today's big idea

Every dataset has a story about how it was collected. If you can't explain how, you can't honestly publish what.

Misleading Charts — Four Classics

14 min

1. Truncated y-axis

A 0.5% difference looks dramatic if the y-axis starts at 99%. Default to starting bar charts at 0; if you must truncate, mark it with a broken-axis symbol.

2. Cherry-picked time window

Showing only the months where a metric went down (or up) makes a non-trend look like one. Always include the full range you have, or state clearly why you cropped.

3. Dual-axis scales chosen to deceive

Two lines on different scales can be made to fit any narrative. Either annotate the scales clearly or split into two charts.

4. Pie charts with too many slices

A 12-slice pie is unreadable. Use a bar chart; reserve pies for ≤ 5 slices with one clear majority.

The honest-chart checklist

✅ y-axis starts at 0 (or labelled if not)
✅ time range covers all available data
✅ categorical axes are sorted by value (not alphabetical)
✅ scales match across compared groups
✅ axes are labelled with units
✅ source + date appear in the caption
✅ a "this excludes X" note when relevant

Worked Example · Fix the Chart

12 min

Imagine a colleague drew this chart:

fig, ax = plt.subplots()
ax.bar(["Last year", "This year"], [99.5, 99.9])
ax.set_ylim(99, 100)
ax.set_title("Server uptime — doubled this year!")
plt.show()

The numbers are accurate; the chart is dishonest. set_ylim(99, 100) makes a 0.4-percentage-point change look enormous. The title doubled is wrong: 99.5 → 99.9 is a 0.4-point absolute improvement, not 2×.

Honest version

fig, ax = plt.subplots()
ax.bar(["Last year", "This year"], [99.5, 99.9],
       color=["#888", "#4a90e2"])
ax.set_ylim(0, 100)
ax.set_title("Server uptime — 0.4 pp improvement YoY")
ax.set_ylabel("% uptime")
for x, y in zip([0, 1], [99.5, 99.9]):
    ax.text(x, y + 0.5, f"{y}%", ha="center")
plt.show()

Read the diff

Two changes: y-axis 0-100 makes the bars look almost identical (they nearly are); the title now says what changed in plain terms. The honest version is less exciting — and more truthful. That trade-off is the job.

Audit Your Own Work

13 min

01 🟢 Re-read your data story

Open your Lesson 37 data story. Score each chart against the honest-chart checklist. Mark any failures.

02 🟡 Add a methodology box

Add a 4-sentence "methodology" section to the README: source, date range, what you cleaned, what you excluded.

03 🔴 Bias hunt

List three biases that could be in your dataset. For each, write one sentence on how to mitigate or disclose it.

Mini-Challenge · The "No Story" Result

8 min

Sometimes the data shows nothing interesting. Practice: write a half-page report saying just that — the question, the analysis, the absence of a finding. Honest "no signal" is more valuable than dressed-up noise.

Recap

3 min

Five biases. Four misleading chart tricks. One checklist. Every chart you ship is a small ethical decision; you can either make it clearly or make it dramatic, and only one of those is fair to the reader.

Homework

4 min

Audit one published chart from a newspaper, magazine or company blog. List two things they did well and two things you would change. Submit your audit as a markdown file with the screenshot.