Project Goals
3 min- Build
convert.py— a CLI that readssys.argvfor the input and output paths. - Detect the conversion direction from file extensions (
.csvor.json). - Write the result to disk, preserving column order on the way back to CSV.
- Round-trip: prove that
file → other format → original formatgives back the same data.
Warm-Up · sys.argv 30-Second Tour
5 minsys.argv is a list of strings — every word you typed after python:
# echo.py import sys print(sys.argv)
$ python echo.py hello world ['echo.py', 'hello', 'world']
Index 0 is the script name. The real arguments start at index 1. We'll use sys.argv[1] for the input path and sys.argv[2] for the output.
A CLI tool reads inputs, transforms them, writes outputs. Today: input path + output path → conversion. Same shape as 80% of automation work.
Plan · Pseudocode First
14 minBefore any code, write the recipe:
1. Read inp_path and out_path from sys.argv. 2. Decide direction from extensions. - inp .csv → load CSV into a list of dicts - inp .json → load JSON into a list of dicts 3. Pick the output format. - out .json → json.dump - out .csv → csv.DictWriter 4. Print a one-line success message.
Step 2 — Load CSV as list of dicts
import csv def load_csv(path): with open(path, newline="") as f: return list(csv.DictReader(f))
Step 3 — Save list of dicts as JSON
import json def save_json(rows, path): with open(path, "w") as f: json.dump(rows, f, indent=2, ensure_ascii=False)
Step 4 — JSON → CSV
The tricky bit is column order. JSON has no concept of order, but CSV does. Use the keys of the first row, plus a union of every other row's keys (so we don't drop columns that only appear in later rows):
def save_csv(rows, path): fields = list(rows[0].keys()) for r in rows[1:]: for k in r: if k not in fields: fields.append(k) with open(path, "w", newline="") as f: w = csv.DictWriter(f, fieldnames=fields) w.writeheader() w.writerows(rows)
Step 5 — The router
Pick which loader and which saver based on extensions:
import sys from pathlib import Path inp = Path(sys.argv[1]) out = Path(sys.argv[2]) # Load if inp.suffix == ".csv": rows = load_csv(inp) elif inp.suffix == ".json": with inp.open() as f: rows = json.load(f) else: sys.exit(f"❌ unknown input format: {inp.suffix}") # Save if out.suffix == ".json": save_json(rows, out) elif out.suffix == ".csv": save_csv(rows, out) else: sys.exit(f"❌ unknown output format: {out.suffix}") print(f"✅ {inp.name} → {out.name} ({len(rows)} rows)")
Build · convert.py
12 minAssemble everything from the previous section into one file.
# convert.py — CSV ↔ JSON converter import csv import json import sys from pathlib import Path def load_csv(path): with open(path, newline="") as f: return list(csv.DictReader(f)) def load_json(path): with open(path) as f: return json.load(f) def save_json(rows, path): with open(path, "w") as f: json.dump(rows, f, indent=2, ensure_ascii=False) def save_csv(rows, path): fields = list(rows[0].keys()) for r in rows[1:]: for k in r: if k not in fields: fields.append(k) with open(path, "w", newline="") as f: w = csv.DictWriter(f, fieldnames=fields) w.writeheader() w.writerows(rows) def main(argv): if len(argv) != 3: sys.exit("usage: python convert.py <in.csv|json> <out.csv|json>") inp, out = Path(argv[1]), Path(argv[2]) if inp.suffix == ".csv": rows = load_csv(inp) elif inp.suffix == ".json": rows = load_json(inp) else: sys.exit(f"❌ unknown input format: {inp.suffix}") if out.suffix == ".json": save_json(rows, out) elif out.suffix == ".csv": save_csv(rows, out) else: sys.exit(f"❌ unknown output format: {out.suffix}") print(f"✅ {inp.name} → {out.name} ({len(rows)} rows)") if __name__ == "__main__": main(sys.argv)
Try it
$ python convert.py students.csv students.json ✅ students.csv → students.json (4 rows) $ python convert.py students.json roundtrip.csv ✅ students.json → roundtrip.csv (4 rows)
Open students.csv and roundtrip.csv side-by-side. Same rows, same column order. CSV → JSON → CSV gave you back the original. That's the proof your converter works.
Extensions · Make It Yours
13 minIf argv[2] is the literal string -, print the JSON to the screen instead of saving.
Hint
if str(out) == "-": print(json.dumps(rows, indent=2, ensure_ascii=False)) return
If only one argument is given, derive the output name automatically by swapping the extension.
Hint
if len(argv) == 2: inp = Path(argv[1]) other = ".json" if inp.suffix == ".csv" else ".csv" out = inp.with_suffix(other) elif len(argv) == 3: inp, out = Path(argv[1]), Path(argv[2]) else: sys.exit("usage: ...")
Add a --verify flag: after the conversion, do the reverse conversion to a temp file and check the strings match line-for-line.
Hint
# After saving 'out': if "--verify" in argv: tmp = inp.with_name("verify_" + inp.name) # Re-convert out → tmp using the same logic # Read both files as text, compare line-by-line ok = inp.read_text() == tmp.read_text() print("verify:", "ok ✅" if ok else "MISMATCH ❌")
Note: a strict text compare only works on simple CSVs (no funny escaping). For real round-trips, compare the data (list-of-dicts) instead.
Mini-Challenge · Nested JSON Flattener
8 minIf the JSON has nested fields like {"address": {"city": "KL"}}, the simple converter drops them. Build flatten.py that walks a list-of-dicts and flattens one level of nesting using dot keys: address.city.
Show one possible solution
# flatten.py — one-level flatten for CSV-ready JSON import json, sys, csv from pathlib import Path def flatten(row): out = {} for k, v in row.items(): if isinstance(v, dict): for k2, v2 in v.items(): out[f"{k}.{k2}"] = v2 else: out[k] = v return out inp = Path(sys.argv[1]) rows = json.loads(inp.read_text()) flat = [flatten(r) for r in rows] # union of all keys, in first-seen order fields = [] for r in flat: for k in r: if k not in fields: fields.append(k) out = inp.with_suffix(".csv") with out.open("w", newline="") as f: w = csv.DictWriter(f, fieldnames=fields) w.writeheader() w.writerows(flat) print(f"✅ wrote {len(flat)} rows to {out.name}")
Non-negotiables: dotted keys for nested values, union of fields so rows with missing keys don't crash.
Recap
3 minYou shipped a real CLI tool. The shape — read args, choose a loader, choose a saver, write output — is the shape of nearly every script in this level. Round-tripping is the test that proves a converter is honest: A → B → A' should equal A. Tomorrow we handle the next universal data problem: dates and times.
Vocabulary Card
- sys.argv
- List of command-line arguments. Index 0 is the script name.
- Path(...).suffix
- The file extension, including the dot (
.csv,.json). - round-trip
- Converting A → B then B → A and checking you got A back.
Homework
4 minPolish convert.py with three real-world touches:
- Refuse to overwrite an existing output file unless
--forceis passed. - Print row count + column count after the conversion (e.g.,
4 rows, 3 columns). - If the input doesn't exist, fail with a clear message — not a Python traceback.
Sample · convert.py with polish
# convert.py — with --force, count, error guard import csv, json, sys from pathlib import Path def load(path): if path.suffix == ".csv": with path.open(newline="") as f: return list(csv.DictReader(f)) if path.suffix == ".json": return json.loads(path.read_text()) sys.exit(f"❌ unknown input format: {path.suffix}") def save(rows, path): if path.suffix == ".json": with path.open("w") as f: json.dump(rows, f, indent=2, ensure_ascii=False) return if path.suffix == ".csv": fields = [] for r in rows: for k in r: if k not in fields: fields.append(k) with path.open("w", newline="") as f: w = csv.DictWriter(f, fieldnames=fields) w.writeheader() w.writerows(rows) return sys.exit(f"❌ unknown output format: {path.suffix}") def main(argv): force = "--force" in argv args = [a for a in argv[1:] if not a.startswith("--")] if len(args) != 2: sys.exit("usage: python convert.py <in> <out> [--force]") inp, out = Path(args[0]), Path(args[1]) if not inp.exists(): sys.exit(f"❌ input not found: {inp}") if out.exists() and not force: sys.exit(f"❌ {out} exists. use --force to overwrite.") rows = load(inp) save(rows, out) cols = len(rows[0].keys()) if rows else 0 print(f"✅ {inp.name} → {out.name} ({len(rows)} rows, {cols} columns)") if __name__ == "__main__": main(sys.argv)
Non-negotiables: friendly errors instead of tracebacks, refuse to clobber output unless forced, print the size of what you wrote.