PY-L4-03 · CSV Files — The Spreadsheet of Code

Learning Goals

3 min

Recognise the CSV format and its three rules (header, comma, newline).
Read a CSV with csv.reader — each row becomes a list.
Read a CSV with csv.DictReader — each row becomes a dict.
Handle the gotchas: the newline="" open-arg and converting strings to numbers.

Warm-Up · What Is a CSV?

5 min

CSV stands for Comma-Separated Values. Here is a complete CSV file:

name,age,score
Aisyah,13,88
Wei Jie,14,75
Suresh,13,92
Mei,14,80

Three rules:

The first row is usually the header — the column names.
Each row is one record, fields separated by commas.
Newline ends a row.

Save the four lines above as students.csv. We'll use it for the rest of the lesson.

Today's big idea

CSV is the lingua franca of tabular data. Every spreadsheet program reads it, every database imports it, and Python's csv module turns it into rows you can loop.

New Concept · csv.reader and csv.DictReader

14 min

csv.reader → each row is a list

import csv

with open("students.csv", newline="") as f:
    reader = csv.reader(f)
    header = next(reader)        # ['name', 'age', 'score']
    print("Header:", header)
    for row in reader:
        print(row)

Header: ['name', 'age', 'score']
['Aisyah', '13', '88']
['Wei Jie', '14', '75']
['Suresh', '13', '92']
['Mei', '14', '80']

Every value is a string, including the numbers. Converting comes next.

csv.DictReader → each row is a dict

import csv

with open("students.csv", newline="") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row)

{'name': 'Aisyah', 'age': '13', 'score': '88'}
{'name': 'Wei Jie', 'age': '14', 'score': '75'}
{'name': 'Suresh', 'age': '13', 'score': '92'}
{'name': 'Mei', 'age': '14', 'score': '80'}

DictReader uses the header as keys. You access columns by name (row["score"]) instead of by position. Most of the time that's what you want.

Why `newline=""`?

The csv module handles end-of-line characters itself. If you open the file in Python's default text mode (which converts newlines), some Excel-saved CSVs get extra blank rows. Always pass newline="" when working with CSVs — make it a habit.

Converting strings to numbers

Every CSV value is text. To do maths, you convert:

with open("students.csv", newline="") as f:
    reader = csv.DictReader(f)
    total = 0
    count = 0
    for row in reader:
        total += int(row["score"])
        count += 1
    print(f"Average score: {total / count:.1f}")

Average score: 83.8

Worked Example · Class Stats from a CSV

12 min

# class_stats.py — read a CSV and report
import csv

scores = []
with open("students.csv", newline="") as f:
    for row in csv.DictReader(f):
        scores.append({
            "name":  row["name"],
            "age":   int(row["age"]),
            "score": int(row["score"]),
        })

print(f"📊 {len(scores)} students loaded\n")

# Top scorer
top = max(scores, key=lambda s: s["score"])
print(f"🏆 Top:     {top['name']:<10} → {top['score']}")

# Lowest
low = min(scores, key=lambda s: s["score"])
print(f"📉 Lowest:  {low['name']:<10} → {low['score']}")

# Average
avg = sum(s["score"] for s in scores) / len(scores)
print(f"📈 Average: {avg:.1f}")

# Group by age
print("\nBy age:")
ages = sorted({s["age"] for s in scores})
for a in ages:
    these = [s for s in scores if s["age"] == a]
    avg_a = sum(s["score"] for s in these) / len(these)
    print(f"  age {a}: n={len(these)}, avg={avg_a:.1f}")

Sample output

📊 4 students loaded

🏆 Top:     Suresh     → 92
📉 Lowest:  Wei Jie    → 75
📈 Average: 83.8

By age:
  age 13: n=2, avg=90.0
  age 14: n=2, avg=77.5

Read the diff

Four passes. The load pass reads + converts strings to ints. min/max/sum work on the cleaned list. The group-by at the bottom is a classic pattern: collect the distinct keys, then for each key build a sub-list. You'll see this 100 times before Level 4 is done.

Try It Yourself

13 min

01 🟢 Print names only

Read students.csv with DictReader and print just the names — one per line.

Hint

with open("students.csv", newline="") as f:
    for row in csv.DictReader(f):
        print(row["name"])

02 🟡 Filter by score

Print only students with score > 80. Use DictReader, convert the score to int, filter inside the loop.

Hint

with open("students.csv", newline="") as f:
    for row in csv.DictReader(f):
        if int(row["score"]) > 80:
            print(row["name"], row["score"])

03 🔴 Sort the file

Read all rows, sort by score descending, print the leaderboard. Don't mutate the file yet — that's tomorrow.

Hint

with open("students.csv", newline="") as f:
    rows = list(csv.DictReader(f))

rows.sort(key=lambda r: int(r["score"]), reverse=True)
for i, r in enumerate(rows, 1):
    print(f"{i}. {r['name']:<10} {r['score']}")

Note that list(reader) consumes the whole file into memory — fine for small files, careful with millions of rows.

Mini-Challenge · Sales Report

8 min

Save this CSV as sales.csv:

date,product,quantity,price
2026-05-01,roti,3,1.50
2026-05-01,milo,2,3.00
2026-05-02,roti,5,1.50
2026-05-02,nasi,1,8.00
2026-05-03,milo,4,3.00
2026-05-03,nasi,2,8.00

Build report.py that prints:

Total revenue (sum(qty × price)).
Total quantity per product.
Best-selling product by revenue.

Show one possible solution

# report.py — sales report from CSV
import csv

rows = []
with open("sales.csv", newline="") as f:
    for r in csv.DictReader(f):
        rows.append({
            "date":     r["date"],
            "product":  r["product"],
            "quantity": int(r["quantity"]),
            "price":    float(r["price"]),
        })

# 1. Total revenue
total = sum(r["quantity"] * r["price"] for r in rows)
print(f"Total revenue: RM {total:.2f}")

# 2. Quantity per product
qty = {}
for r in rows:
    qty[r["product"]] = qty.get(r["product"], 0) + r["quantity"]
print("\nQty by product:")
for p, n in qty.items():
    print(f"  {p:<8} {n}")

# 3. Revenue per product → best
rev = {}
for r in rows:
    rev[r["product"]] = rev.get(r["product"], 0) + r["quantity"] * r["price"]
best = max(rev, key=rev.get)
print(f"\n🏆 Best seller: {best}  (RM {rev[best]:.2f})")

Non-negotiables: int + float conversions, accumulate per-product totals in a dict, find the max by value.

Recap

3 min

CSV is a text file where each row is one record and fields are comma-separated. Python's csv module reads it two ways: csv.reader gives lists, csv.DictReader gives dicts keyed by the header. Always open with newline="". Every value is a string — convert to int/float before doing maths. Tomorrow we write CSVs ourselves.

Vocabulary Card

CSV: Comma-Separated Values — text file, one record per line.
csv.reader: Iterates each row as a list of strings.
csv.DictReader: Iterates each row as a dict keyed by the header row.
newline="": The standard kwarg when opening a CSV — avoids blank-row glitches.

Homework

4 min

Find or create a CSV with at least 8 rows about something you care about (Spotify favourites, your class's heights, your top games). Build explore.py that:

Reads it with DictReader.
Prints the row count and the column names.
Computes one number (min, max, average, total) from a numeric column.
Filters and prints rows that match a condition you pick.

Sample · explore.py (using a games CSV)

# explore.py — explore any CSV
import csv

PATH = "games.csv"   # columns: title,genre,year,hours_played

with open(PATH, newline="") as f:
    rows = list(csv.DictReader(f))

print(f"{PATH}: {len(rows)} rows")
print("Columns:", list(rows[0].keys()))

hours = [int(r["hours_played"]) for r in rows]
print(f"\nTotal hours: {sum(hours)}, avg: {sum(hours)/len(hours):.1f}, max: {max(hours)}")

print("\nRPG games:")
for r in rows:
    if r["genre"].lower() == "rpg":
        print(f"  {r['title']} ({r['year']})")

Non-negotiables: DictReader, row count, column names, one numeric summary, one filter.

import csv with open("students.csv", newline="") as f: reader = csv.reader(f) header = next(reader) # ['name', 'age', 'score'] print("Header:", header) for row in reader: print(row)

with open("students.csv", newline="") as f: reader = csv.DictReader(f) total = 0 count = 0 for row in reader: total += int(row["score"]) count += 1 print(f"Average score: {total / count:.1f}")

# class_stats.py — read a CSV and report import csv scores = [] with open("students.csv", newline="") as f: for row in csv.DictReader(f): scores.append({ "name": row["name"], "age": int(row["age"]), "score": int(row["score"]), }) print(f"📊 {len(scores)} students loaded\n") # Top scorer top = max(scores, key=lambda s: s["score"]) print(f"🏆 Top: {top['name']:<10} → {top['score']}") # Lowest low = min(scores, key=lambda s: s["score"]) print(f"📉 Lowest: {low['name']:<10} → {low['score']}") # Average avg = sum(s["score"] for s in scores) / len(scores) print(f"📈 Average: {avg:.1f}") # Group by age print("\nBy age:") ages = sorted({s["age"] for s in scores}) for a in ages: these = [s for s in scores if s["age"] == a] avg_a = sum(s["score"] for s in these) / len(these) print(f" age {a}: n={len(these)}, avg={avg_a:.1f}")

with open("students.csv", newline="") as f: rows = list(csv.DictReader(f)) rows.sort(key=lambda r: int(r["score"]), reverse=True) for i, r in enumerate(rows, 1): print(f"{i}. {r['name']:<10} {r['score']}")

# report.py — sales report from CSV import csv rows = [] with open("sales.csv", newline="") as f: for r in csv.DictReader(f): rows.append({ "date": r["date"], "product": r["product"], "quantity": int(r["quantity"]), "price": float(r["price"]), }) # 1. Total revenue total = sum(r["quantity"] * r["price"] for r in rows) print(f"Total revenue: RM {total:.2f}") # 2. Quantity per product qty = {} for r in rows: qty[r["product"]] = qty.get(r["product"], 0) + r["quantity"] print("\nQty by product:") for p, n in qty.items(): print(f" {p:<8} {n}") # 3. Revenue per product → best rev = {} for r in rows: rev[r["product"]] = rev.get(r["product"], 0) + r["quantity"] * r["price"] best = max(rev, key=rev.get) print(f"\n🏆 Best seller: {best} (RM {rev[best]:.2f})")

Learning Goals

Warm-Up · What Is a CSV?

New Concept · csv.reader and csv.DictReader

csv.reader → each row is a list

csv.DictReader → each row is a dict

Why newline=""?

Converting strings to numbers

Worked Example · Class Stats from a CSV

Read the diff

Try It Yourself

Mini-Challenge · Sales Report

Recap

Vocabulary Card

Homework

Sample · explore.py (using a games CSV)

Learning Goals

Warm-Up · What Is a CSV?

New Concept · csv.reader and csv.DictReader

csv.reader → each row is a list

csv.DictReader → each row is a dict

Why newline=""?

Converting strings to numbers

Worked Example · Class Stats from a CSV

Read the diff

Try It Yourself

Mini-Challenge · Sales Report

Recap

Vocabulary Card

Homework

Sample · explore.py (using a games CSV)

Why `newline=""`?

Why `newline=""`?