Learning Goals
3 min- Sort rows by one or many columns; ascending/descending per-column.
- Use
nlargest/nsmallestfor fast top-N. - Rank rows with
.rank()and themethod=option. - Combine sort + group for "top N per group".
Warm-Up · The Simplest Sort
5 mindf.sort_values("score", ascending=False)
That's 95% of what you'll ever need. The other 5%: multiple keys, NA placement, sort by index.
sort_values returns a new DataFrame; the original is unchanged. Always assign the result back if you want the order to stick.
New Concept · Sort, Top-N, Rank
14 minMultiple sort keys
df.sort_values(["class", "score"], ascending=[True, False]) # class asc, then within class score desc
nlargest / nsmallest — faster than sort + head
df.nlargest(3, "score") # top 3 by score df.nsmallest(3, "score") # bottom 3 df.nlargest(5, ["score", "age"]) # primary score, tiebreaker age
Internally these don't fully sort the frame — they use partial sorting, which is faster for large data.
Sort by index
df.sort_index() # by row index df.sort_index(axis=1) # by column name
Rank
df["rank"] = df["score"].rank(ascending=False, method="dense")
method= options: average (default), min, max, first, dense. For leaderboards dense avoids gaps after ties.
Top N per group
# Top 2 students per class (df.sort_values(["class", "score"], ascending=[True, False]) .groupby("class").head(2))
Bonus — sort by a derived value
df.iloc[df["name"].str.len().argsort()] # by length of name
Worked Example · Top-N Dashboard
12 minimport pandas as pd df = pd.read_csv("clean.csv", parse_dates=["date"]) df["total"] = df["quantity"] * df["price"] # 1. Top 5 customers by spend top_customers = (df.groupby("customer", as_index=False)["total"].sum() .nlargest(5, "total") .reset_index(drop=True)) print(top_customers) # 2. Worst day by revenue worst_day = (df.groupby(df["date"].dt.date)["total"].sum() .nsmallest(1)) print(f"\n📉 worst day: {worst_day.index[0]} RM {worst_day.values[0]:.2f}") # 3. Top 2 products per day top_per_day = (df.groupby(["date", "product"], as_index=False)["total"].sum() .sort_values(["date", "total"], ascending=[True, False]) .groupby("date").head(2)) print("\n🏆 top 2 per day:") print(top_per_day) # 4. Add a rank df["rank_today"] = (df.groupby(df["date"].dt.date)["total"] .rank(ascending=False, method="dense"))
Read the diff
Same DataFrame, four different reports. The combo groupby + sort + head is the Pythonic "top N per group" — easier to read than window functions in SQL once you've used it twice.
Try It Yourself
13 minPrint the three rows with the largest total.
Add a class_rank column to students.csv (1 = top of each class).
Hint
df["class_rank"] = (df.groupby("class")["score"] .rank(ascending=False, method="dense"))
For a daily-revenue series, compute the day-on-day change. Find the day with the biggest negative change.
Hint
daily = df.groupby(df["date"].dt.date)["total"].sum() diff = daily.diff() worst = diff.nsmallest(1) print(f"biggest drop: {worst.index[0]} RM {worst.values[0]:.2f}")
Mini-Challenge · Climbers & Fallers
8 minImagine two daily snapshots (yesterday, today). Each has columns product, revenue. Find the top 3 climbers (biggest gainers) and top 3 fallers between the two days.
Show one possible solution
merged = (yest.merge(today, on="product", suffixes=("_y", "_t"), how="outer") .fillna(0)) merged["delta"] = merged["revenue_t"] - merged["revenue_y"] print("🚀 climbers:") print(merged.nlargest(3, "delta")[["product", "delta"]]) print("\n📉 fallers:") print(merged.nsmallest(3, "delta")[["product", "delta"]])
Non-negotiables: outer merge so products on one day only still count; fillna(0); nlargest/nsmallest by delta.
Recap
3 minsort_values + slicing covers nearly every ordering need. nlargest/nsmallest are faster when you only want N. rank with method="dense" is the leaderboard helper. Combine sort with groupby to compute top-N per group. Tomorrow we level up with the big challenge.
Homework
4 minFrom your real CSV, produce four ranked outputs:
- Top 5 by your most important numeric column.
- Bottom 5 by the same.
- Top 3 per group, for any sensible group column.
- A "biggest mover" report comparing two slices of your data (months, weeks, categories).
The answer is your four reports; structure them like the worked example.