Learning Goals
3 min- Recognise the CSV format and its three rules (header, comma, newline).
- Read a CSV with
csv.reader— each row becomes a list. - Read a CSV with
csv.DictReader— each row becomes a dict. - Handle the gotchas: the
newline=""open-arg and converting strings to numbers.
Warm-Up · What Is a CSV?
5 minCSV stands for Comma-Separated Values. Here is a complete CSV file:
name,age,score Aisyah,13,88 Wei Jie,14,75 Suresh,13,92 Mei,14,80
Three rules:
- The first row is usually the header — the column names.
- Each row is one record, fields separated by commas.
- Newline ends a row.
Save the four lines above as students.csv. We'll use it for the rest of the lesson.
CSV is the lingua franca of tabular data. Every spreadsheet program reads it, every database imports it, and Python's csv module turns it into rows you can loop.
New Concept · csv.reader and csv.DictReader
14 mincsv.reader → each row is a list
import csv with open("students.csv", newline="") as f: reader = csv.reader(f) header = next(reader) # ['name', 'age', 'score'] print("Header:", header) for row in reader: print(row)
Header: ['name', 'age', 'score'] ['Aisyah', '13', '88'] ['Wei Jie', '14', '75'] ['Suresh', '13', '92'] ['Mei', '14', '80']
Every value is a string, including the numbers. Converting comes next.
csv.DictReader → each row is a dict
import csv with open("students.csv", newline="") as f: reader = csv.DictReader(f) for row in reader: print(row)
{'name': 'Aisyah', 'age': '13', 'score': '88'}
{'name': 'Wei Jie', 'age': '14', 'score': '75'}
{'name': 'Suresh', 'age': '13', 'score': '92'}
{'name': 'Mei', 'age': '14', 'score': '80'}DictReader uses the header as keys. You access columns by name (row["score"]) instead of by position. Most of the time that's what you want.
Why newline=""?
The csv module handles end-of-line characters itself. If you open the file in Python's default text mode (which converts newlines), some Excel-saved CSVs get extra blank rows. Always pass newline="" when working with CSVs — make it a habit.
Converting strings to numbers
Every CSV value is text. To do maths, you convert:
with open("students.csv", newline="") as f: reader = csv.DictReader(f) total = 0 count = 0 for row in reader: total += int(row["score"]) count += 1 print(f"Average score: {total / count:.1f}")
Average score: 83.8
Worked Example · Class Stats from a CSV
12 min# class_stats.py — read a CSV and report import csv scores = [] with open("students.csv", newline="") as f: for row in csv.DictReader(f): scores.append({ "name": row["name"], "age": int(row["age"]), "score": int(row["score"]), }) print(f"📊 {len(scores)} students loaded\n") # Top scorer top = max(scores, key=lambda s: s["score"]) print(f"🏆 Top: {top['name']:<10} → {top['score']}") # Lowest low = min(scores, key=lambda s: s["score"]) print(f"📉 Lowest: {low['name']:<10} → {low['score']}") # Average avg = sum(s["score"] for s in scores) / len(scores) print(f"📈 Average: {avg:.1f}") # Group by age print("\nBy age:") ages = sorted({s["age"] for s in scores}) for a in ages: these = [s for s in scores if s["age"] == a] avg_a = sum(s["score"] for s in these) / len(these) print(f" age {a}: n={len(these)}, avg={avg_a:.1f}")
Sample output
📊 4 students loaded 🏆 Top: Suresh → 92 📉 Lowest: Wei Jie → 75 📈 Average: 83.8 By age: age 13: n=2, avg=90.0 age 14: n=2, avg=77.5
Read the diff
Four passes. The load pass reads + converts strings to ints. min/max/sum work on the cleaned list. The group-by at the bottom is a classic pattern: collect the distinct keys, then for each key build a sub-list. You'll see this 100 times before Level 4 is done.
Try It Yourself
13 minRead students.csv with DictReader and print just the names — one per line.
Hint
with open("students.csv", newline="") as f: for row in csv.DictReader(f): print(row["name"])
Print only students with score > 80. Use DictReader, convert the score to int, filter inside the loop.
Hint
with open("students.csv", newline="") as f: for row in csv.DictReader(f): if int(row["score"]) > 80: print(row["name"], row["score"])
Read all rows, sort by score descending, print the leaderboard. Don't mutate the file yet — that's tomorrow.
Hint
with open("students.csv", newline="") as f: rows = list(csv.DictReader(f)) rows.sort(key=lambda r: int(r["score"]), reverse=True) for i, r in enumerate(rows, 1): print(f"{i}. {r['name']:<10} {r['score']}")
Note that list(reader) consumes the whole file into memory — fine for small files, careful with millions of rows.
Mini-Challenge · Sales Report
8 minSave this CSV as sales.csv:
date,product,quantity,price 2026-05-01,roti,3,1.50 2026-05-01,milo,2,3.00 2026-05-02,roti,5,1.50 2026-05-02,nasi,1,8.00 2026-05-03,milo,4,3.00 2026-05-03,nasi,2,8.00
Build report.py that prints:
- Total revenue (
sum(qty × price)). - Total quantity per product.
- Best-selling product by revenue.
Show one possible solution
# report.py — sales report from CSV import csv rows = [] with open("sales.csv", newline="") as f: for r in csv.DictReader(f): rows.append({ "date": r["date"], "product": r["product"], "quantity": int(r["quantity"]), "price": float(r["price"]), }) # 1. Total revenue total = sum(r["quantity"] * r["price"] for r in rows) print(f"Total revenue: RM {total:.2f}") # 2. Quantity per product qty = {} for r in rows: qty[r["product"]] = qty.get(r["product"], 0) + r["quantity"] print("\nQty by product:") for p, n in qty.items(): print(f" {p:<8} {n}") # 3. Revenue per product → best rev = {} for r in rows: rev[r["product"]] = rev.get(r["product"], 0) + r["quantity"] * r["price"] best = max(rev, key=rev.get) print(f"\n🏆 Best seller: {best} (RM {rev[best]:.2f})")
Non-negotiables: int + float conversions, accumulate per-product totals in a dict, find the max by value.
Recap
3 minCSV is a text file where each row is one record and fields are comma-separated. Python's csv module reads it two ways: csv.reader gives lists, csv.DictReader gives dicts keyed by the header. Always open with newline="". Every value is a string — convert to int/float before doing maths. Tomorrow we write CSVs ourselves.
Vocabulary Card
- CSV
- Comma-Separated Values — text file, one record per line.
- csv.reader
- Iterates each row as a list of strings.
- csv.DictReader
- Iterates each row as a dict keyed by the header row.
- newline=""
- The standard kwarg when opening a CSV — avoids blank-row glitches.
Homework
4 minFind or create a CSV with at least 8 rows about something you care about (Spotify favourites, your class's heights, your top games). Build explore.py that:
- Reads it with
DictReader. - Prints the row count and the column names.
- Computes one number (min, max, average, total) from a numeric column.
- Filters and prints rows that match a condition you pick.
Sample · explore.py (using a games CSV)
# explore.py — explore any CSV import csv PATH = "games.csv" # columns: title,genre,year,hours_played with open(PATH, newline="") as f: rows = list(csv.DictReader(f)) print(f"{PATH}: {len(rows)} rows") print("Columns:", list(rows[0].keys())) hours = [int(r["hours_played"]) for r in rows] print(f"\nTotal hours: {sum(hours)}, avg: {sum(hours)/len(hours):.1f}, max: {max(hours)}") print("\nRPG games:") for r in rows: if r["genre"].lower() == "rpg": print(f" {r['title']} ({r['year']})")
Non-negotiables: DictReader, row count, column names, one numeric summary, one filter.