Learning Goals
3 min- Compute mean, median, mode, stdev, variance, percentiles.
- Spot when median tells a different story than mean (outliers!).
- Resample time series with
df.resample(D, W, M). - Compute rolling and expanding statistics.
Warm-Up · Mean vs Median
5 minimport statistics salaries = [3500, 3800, 3900, 4000, 4200, 50000] print(statistics.mean(salaries)) # 11_566 ← dragged up by one big number print(statistics.median(salaries)) # 3_950 ← reflects most people
Use median when outliers exist. Use mean when the data is symmetric and outliers are rare.
Each summary statistic answers a slightly different question. Pick the one that matches what you actually want to know — and tell your reader why.
New Concept · Stats Toolkit
14 minstatistics module (stdlib)
import statistics as st st.mean(x) # arithmetic average st.median(x) # middle value st.mode(x) # most-common value st.stdev(x) # sample standard deviation st.variance(x) # sample variance st.quantiles(x, n=4) # [Q1, Q2, Q3]
pandas conveniences
s = df["score"] s.mean(), s.median(), s.std(), s.var() s.quantile(0.5) # = median s.quantile([0.25, 0.5, 0.75]) s.skew() # asymmetry s.kurtosis() # tail heaviness s.value_counts() # frequency table s.value_counts(bins=5) # auto-binned histogram counts
Outlier detection — z-score
mu, sigma = s.mean(), s.std() z = (s - mu) / sigma outliers = s[z.abs() > 3] print(outliers)
A z-score > 3 is the classic "more than 3 standard deviations from the mean" rule. Use as a starting point, not a verdict — domain knowledge matters.
Time-series resample
df = df.set_index("date") df["sales"].resample("W").sum() # weekly total df["sales"].resample("ME").mean() # monthly mean (Month End) df["sales"].resample("QE").max() # quarterly max
Rolling and expanding
df["sales"].rolling(window=7).mean() # 7-day moving average df["sales"].rolling(window=7).std() # 7-day rolling std df["sales"].expanding().mean() # running cumulative mean
Worked Example · A Stats Report
12 minimport pandas as pd df = pd.read_csv("clean.csv", parse_dates=["date"]) df["total"] = df["quantity"] * df["price"] s = df["total"] print(f"n : {s.count()}") print(f"mean : {s.mean():.2f}") print(f"median : {s.median():.2f}") print(f"stdev : {s.std():.2f}") print(f"p25/p75 : {s.quantile(0.25):.2f}, {s.quantile(0.75):.2f}") print(f"min/max : {s.min():.2f}, {s.max():.2f}") print(f"skew : {s.skew():.2f}") # Outliers z = (s - s.mean()) / s.std() print(f"\noutliers (|z| > 2): {(z.abs() > 2).sum()} rows") print(df.loc[z.abs() > 2, ["date", "customer", "product", "total"]]) # Weekly resample weekly = df.set_index("date")["total"].resample("W-MON").sum() print("\nweekly revenue:") print(weekly)
Read the diff
Seven stats answer most questions about distribution. The outlier list shows specific rows, not just a count. The weekly resample turns daily noise into a story. PCED tasks fit this exact shape — given a series, report.
Try It Yourself
13 minPrint n, mean, median, stdev, min, max for any numeric column.
Print rows where the value is more than 2 stdev from the mean.
For a long enough series, plot weekly sum vs monthly sum. Comment on what each emphasises.
Mini-Challenge · Z-Score by Group
8 minFor each product, compute the per-row z-score of total within that product. Show the top 3 highest z-scores overall.
Show one possible solution
def z(s): return (s - s.mean()) / s.std() df["z"] = df.groupby("product")["total"].transform(z) print(df.nlargest(3, "z")[["date", "product", "total", "z"]])
transform applies a function within each group and aligns back to the original index — that's how you write per-group z-scores cleanly.
Recap
3 minMean for symmetric, median for skewed; std + quantiles describe spread; z-score finds extremes. resample rolls daily series into weekly / monthly. rolling + expanding show local and cumulative trends. The PCED exam is mostly "given a Series, here are eight questions" — practice fluency.
Homework
4 minTake your real CSV. Produce a one-page Markdown stats report: n, mean, median, stdev, percentiles, three outliers (if any), a weekly resample plot. This is the kind of artefact PCED tasks ask for.