PY-L5-46 · AI Ethics — Bias, Fairness & Responsibility

Learning Goals

3 min

Explain where AI bias comes from and how to detect it.
Apply a fairness check to a model's predictions across groups.
Reason about privacy, consent, and data provenance.
Adopt the builder's checklist: transparency, human accountability, harm-minimisation.

Warm-Up · Garbage In, Bias Out

5 min

A hiring model trained on 10 years of mostly-male hires
learns to prefer male candidates — not because it's "evil",
but because it faithfully copied a biased history.

Today's big idea

AI learns patterns from data — including unfair ones baked into history. A model can be technically accurate AND deeply unfair. Catching this is YOUR job as the builder; the model won't flag it for you.

New Concept · The Five Questions

14 min

1. Bias — where does it come from?

historical bias   data reflects past unfairness (hiring example)
sampling bias     some groups under-represented in the data
label bias        the labels themselves were assigned unfairly
proxy bias        a "neutral" feature (postcode) stands in for a protected one

2. Fairness — measure it across groups

# check accuracy / approval rate per group, not just overall
for group in df["gender"].unique():
    mask = df["gender"] == group
    rate = model.predict(X[mask]).mean()   # approval rate
    print(f"{group}: approval rate {rate:.1%}")
# big gaps between groups → investigate, don't ship

Overall accuracy can hide that a model works great for one group and badly for another. Always break metrics down by relevant groups.

3. Privacy & consent

Did the people in your data agree to this use?
Are you storing more personal data than you need?
Could outputs re-identify someone who should stay anonymous?

4. Transparency

Can you explain a decision to the person it affects? Prefer interpretable models (trees, logistic regression) for high-stakes decisions; document data sources and known limits.

5. Human accountability

AI should ASSIST consequential decisions, not make them alone.
A human must be able to review, override, and be answerable.
"The algorithm decided" is never an acceptable excuse.

Worked Example · A Fairness Audit

12 min

# fairness_audit.py — check a model's behaviour across a group
import pandas as pd
from sklearn.metrics import accuracy_score

# df has: features, true label 'y', a sensitive column 'group', model preds
def audit(df, group_col, y_col, pred_col):
    print(f"overall accuracy: {accuracy_score(df[y_col], df[pred_col]):.2%}\n")
    print(f"{'group':<12}{'n':>6}{'accuracy':>10}{'positive rate':>15}")
    for g, sub in df.groupby(group_col):
        acc = accuracy_score(sub[y_col], sub[pred_col])
        pos = sub[pred_col].mean()
        print(f"{str(g):<12}{len(sub):>6}{acc:>10.1%}{pos:>15.1%}")

# audit(df, "group", "y", "pred")

Sample output

overall accuracy: 88.0%

group           n  accuracy  positive rate
A             520     91.0%          62.0%
B             180     74.0%          31.0%   ← much worse + lower approval

Read the diff

Overall accuracy (88%) looked fine — but group B gets 74% accuracy and half the approval rate of group A. That gap is a red flag: the model may be unfair to group B, possibly due to under-representation in training. The audit makes the invisible visible. A responsible builder investigates and fixes this before shipping — more data for B, reweighting, or not deploying.

Try It Yourself

13 min

01 🟢 Audit a model

Run the fairness audit on a model you built earlier (e.g., Titanic by sex/class). Are metrics even across groups?

02 🟡 Spot the proxy

List 3 "neutral" features that could secretly proxy for a protected attribute (e.g., postcode → race/income). Why is dropping the protected column NOT enough?

03 🔴 LLM bias probe

Prompt an LLM with parallel sentences differing only by a name/gender/origin. Do the responses differ in tone or assumptions? Document what you find.

Mini-Challenge · A Model Card

8 min

Write a one-page "model card" for a model you built: what it does, training data + provenance, intended use, out-of-scope uses, performance overall AND per group, known biases/limits, and who is accountable. This is industry best practice.

Show the template

# Model Card: <name>
- Purpose:          ...
- Training data:    source, size, date, consent status
- Intended use:     ...
- Out-of-scope:     things it must NOT be used for
- Performance:      overall + per-group metrics
- Known limits:     biases, failure modes, blind spots
- Human oversight:  who reviews, who can override, who's accountable

Recap

3 min

AI learns biases from data; accurate ≠ fair. Measure metrics per group, watch for proxy features, respect privacy and consent, prefer transparent models for high stakes, and keep a human accountable. A model card documents all of it. Building powerful AI responsibly is part of the job — not an afterthought. Next: your capstone.

Vocabulary Card

algorithmic bias: Systematic unfairness in a model's outputs, usually learned from data.
proxy feature: A "neutral" feature that stands in for a protected attribute.
fairness audit: Checking performance across groups, not just overall.
model card: A document describing a model's purpose, data, performance, and limits.

Homework

4 min

Write a model card for one model you built in Level 5. Include a real per-group fairness check (even a small one) and at least three honest limitations. You'll attach this to your capstone next lesson.

historical bias data reflects past unfairness (hiring example) sampling bias some groups under-represented in the data label bias the labels themselves were assigned unfairly proxy bias a "neutral" feature (postcode) stands in for a protected one

# check accuracy / approval rate per group, not just overall for group in df["gender"].unique(): mask = df["gender"] == group rate = model.predict(X[mask]).mean() # approval rate print(f"{group}: approval rate {rate:.1%}") # big gaps between groups → investigate, don't ship

# fairness_audit.py — check a model's behaviour across a group import pandas as pd from sklearn.metrics import accuracy_score # df has: features, true label 'y', a sensitive column 'group', model preds def audit(df, group_col, y_col, pred_col): print(f"overall accuracy: {accuracy_score(df[y_col], df[pred_col]):.2%}\n") print(f"{'group':<12}{'n':>6}{'accuracy':>10}{'positive rate':>15}") for g, sub in df.groupby(group_col): acc = accuracy_score(sub[y_col], sub[pred_col]) pos = sub[pred_col].mean() print(f"{str(g):<12}{len(sub):>6}{acc:>10.1%}{pos:>15.1%}") # audit(df, "group", "y", "pred")

# Model Card: <name> - Purpose: ... - Training data: source, size, date, consent status - Intended use: ... - Out-of-scope: things it must NOT be used for - Performance: overall + per-group metrics - Known limits: biases, failure modes, blind spots - Human oversight: who reviews, who can override, who's accountable