Learning Goals
3 minBy the end of this lesson you can:
- Explain why you can't just copy a live database file.
- Back up SQLite safely with its online backup API.
- Dump Postgres/MySQL via
subprocesscallingpg_dump/mysqldump. - Compress, rotate, and verify dumps so backups are small and trustworthy.
Warm-Up · Why Not Just Copy the File?
5 minYou might think "a SQLite database is one file — just shutil.copy it." But if the app is writing while you copy, you can capture a half-written transaction and get a corrupt backup that won't open. Server databases (Postgres, MySQL) keep data across many files plus memory — copying the directory is even worse.
Databases need consistent backups — a snapshot that represents a valid moment, with no in-flight writes. Each database provides a proper tool for this: SQLite has an online backup API; Postgres has pg_dump; MySQL has mysqldump. Use the right tool, never a raw file copy. Then the file skills from Lesson 38 (compress, rotate, verify) apply to the resulting dump.
New Concept · Safe Dumps
14 minSQLite: the online backup API
import sqlite3 from pathlib import Path def backup_sqlite(db_path: str, dest: str) -> None: src = sqlite3.connect(db_path) dst = sqlite3.connect(dest) with dst: # transaction src.backup(dst) # consistent copy, even while in use src.close(); dst.close() backup_sqlite("app.db", "backups/app-backup.db")
connection.backup() copies a live SQLite database safely — it handles concurrent writes and gives you a consistent snapshot. Never shutil.copy a SQLite file that an app might be writing.
SQLite: a plain-text SQL dump
import sqlite3 def dump_sqlite_sql(db_path: str, out: str) -> None: con = sqlite3.connect(db_path) with open(out, "w", encoding="utf-8") as f: for line in con.iterdump(): # yields CREATE/INSERT statements f.write(line + "\n") con.close()
iterdump() produces the SQL to recreate the whole database — human-readable, diff-able, and restorable on any SQLite. Great for version control or migrating.
Postgres / MySQL via subprocess
import subprocess, os from pathlib import Path def dump_postgres(db: str, out: str) -> None: env = {**os.environ, "PGPASSWORD": os.environ["PG_PASSWORD"]} with open(out, "wb") as f: subprocess.run( ["pg_dump", "-h", os.environ["PG_HOST"], "-U", os.environ["PG_USER"], "-d", db], stdout=f, env=env, check=True, timeout=600) # MySQL is the same shape: # subprocess.run(["mysqldump", "-h", host, "-u", user, # f"-p{password}", db], stdout=f, check=True)
- These ship with the database — call them with
subprocess(Lesson 9), capturing stdout into the dump file. - Pass the password via the environment (
PGPASSWORD), not on the command line where it'd show in process lists. check=Trueturns a failed dump into an exception you can catch — a silent failed backup is the worst kind.
Compress the dump
import gzip, shutil def gzip_file(path: str) -> str: out = path + ".gz" with open(path, "rb") as f_in, gzip.open(out, "wb") as f_out: shutil.copyfileobj(f_in, f_out) Path(path).unlink() # remove the uncompressed original return out
SQL dumps are text and compress dramatically (often 10×). gzip is built in; for huge dumps, stream straight into a .gz to avoid a giant intermediate file.
The only proof a backup works is restoring it. Regularly restore your dump into a throwaway/test database and check the row counts. Plenty of teams discovered their backups were broken only when they needed them. Automate a periodic restore-test, not just the dump.
Worked Example · A Rotating DB Backup Job
12 minGoal: a scheduled-ready job that dumps a SQLite database, compresses it, names it by timestamp, rotates old backups, and verifies the dump opens — the full safe-backup lifecycle.
import sqlite3, gzip, shutil, logging from pathlib import Path from datetime import datetime logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") log = logging.getLogger("dbbackup") def backup_db(db_path: str, backup_dir: str, keep: int = 7) -> Path: db = Path(db_path) out_dir = Path(backup_dir); out_dir.mkdir(parents=True, exist_ok=True) stamp = datetime.now().strftime("%Y%m%d-%H%M%S") raw = out_dir / f"{db.stem}-{stamp}.db" # 1) consistent copy via the backup API src = sqlite3.connect(db_path) dst = sqlite3.connect(str(raw)) with dst: src.backup(dst) src.close(); dst.close() log.info("dumped → %s", raw.name) # 2) verify the dump actually opens and has tables check = sqlite3.connect(str(raw)) tables = check.execute( "SELECT count(*) FROM sqlite_master WHERE type='table'").fetchone()[0] check.close() if tables == 0: raw.unlink() raise RuntimeError("backup has no tables — aborting") log.info("verified: %d tables", tables) # 3) compress gz = raw.with_suffix(".db.gz") with open(raw, "rb") as fi, gzip.open(gz, "wb") as fo: shutil.copyfileobj(fi, fo) raw.unlink() log.info("compressed → %s (%.1f KB)", gz.name, gz.stat().st_size / 1000) # 4) rotate: keep newest <keep> backups = sorted(out_dir.glob(f"{db.stem}-*.db.gz"), reverse=True) for old in backups[keep:]: old.unlink(); log.info("pruned %s", old.name) return gz backup_db("app.db", "backups/db", keep=7)
INFO dumped → app-20260528-020000.db INFO verified: 12 tables INFO compressed → app-20260528-020000.db.gz (148.2 KB) INFO pruned app-20260521-020000.db.gz
Read the code
The four stages are the full lifecycle of a trustworthy backup: a consistent dump via the backup API (not a file copy), an immediate verify that the dump opens and contains tables (catching corruption before you trust it), compression to save space, and rotation to bound disk — the pattern from Lesson 38, now for a database. Schedule this with cron (Lesson 36) and wrap it with the Slack notifier (Lesson 31) so you get pinged if a nightly backup ever fails.
Try It Yourself
13 minSQLite needs no server — create a test DB with a couple of tables and rows to practise on.
Create a SQLite DB, insert some rows, back it up with the backup API, then open the backup and confirm the rows are there.
Hint
import sqlite3 con = sqlite3.connect("test.db") con.execute("CREATE TABLE IF NOT EXISTS t(id INTEGER, name TEXT)") con.execute("INSERT INTO t VALUES (1, 'Aisha')") con.commit(); con.close() backup_sqlite("test.db", "test-backup.db") b = sqlite3.connect("test-backup.db") print(b.execute("SELECT * FROM t").fetchall())
Dump a DB to a .sql file with iterdump, then restore it into a brand-new database with executescript and confirm the data matches.
Hint
# restore: sql = open("dump.sql", encoding="utf-8").read() new = sqlite3.connect("restored.db") new.executescript(sql) print(new.execute("SELECT count(*) FROM t").fetchone())
Run the worked-example backup several times (insert a row between runs) and confirm: each backup verifies, compresses, and that only the newest N .db.gz files remain after rotation.
Mini-Challenge · The Restore Tester
8 minWrite test_restore(backup_gz) that takes a compressed SQLite backup, decompresses it to a temp file, opens it, and reports the table count and total row count across all tables — proving the backup is restorable, not just present. Clean up the temp file afterward.
Show a sample solution
import gzip, shutil, sqlite3, tempfile from pathlib import Path def test_restore(backup_gz: str) -> dict: tmp = Path(tempfile.mktemp(suffix=".db")) with gzip.open(backup_gz, "rb") as fi, open(tmp, "wb") as fo: shutil.copyfileobj(fi, fo) try: con = sqlite3.connect(str(tmp)) tables = [r[0] for r in con.execute( "SELECT name FROM sqlite_master WHERE type='table'")] rows = sum(con.execute(f"SELECT count(*) FROM '{t}'").fetchone()[0] for t in tables) con.close() return {"ok": True, "tables": len(tables), "rows": rows} except sqlite3.DatabaseError as e: return {"ok": False, "error": str(e)} finally: tmp.unlink(missing_ok=True) print(test_restore("backups/db/app-20260528-020000.db.gz")) # {'ok': True, 'tables': 12, 'rows': 8431}
Non-negotiables: decompress to temp, open it, count tables+rows, clean up, report ok/error.
Recap
3 minNever raw-copy a live database — you risk corruption. Use the right tool for a consistent dump: SQLite's connection.backup() or iterdump(); pg_dump/mysqldump via subprocess for servers (password through the environment, check=True). Then treat the dump like any file: compress with gzip, name by timestamp, and rotate to keep the newest N. Critically, an untested backup isn't a backup — automate a periodic restore-test that opens the dump and checks table/row counts. Schedule the whole job (Lesson 36) and alert on failure (Lesson 31).
Vocabulary Card
- consistent backup
- A dump representing a valid moment, with no half-written transactions.
- backup API / dump
- A database's proper export tool (vs. copying its files).
- pg_dump / mysqldump
- Command-line tools that export Postgres/MySQL databases.
- restore test
- Restoring a backup to confirm it actually works.
Homework
4 minBuild dbbackup.py for a SQLite database: timestamped dump via the backup API, verify, compress, rotate (keep N), plus a restore-test subcommand that proves a chosen backup restores. Run it a few times against a test DB you mutate between runs. Document the exact cron line you'd use to run it nightly and how you'd alert on failure.
Sample · dbbackup.py operations
$ python dbbackup.py run app.db backups/db --keep 7
INFO dumped → app-20260528-020000.db
INFO verified: 12 tables
INFO compressed → app-20260528-020000.db.gz (148 KB)
$ python dbbackup.py restore-test backups/db/app-20260528-020000.db.gz
INFO restore OK: 12 tables, 8431 rows
Nightly cron (2am) + alert on failure:
0 2 * * * /usr/bin/python3 /path/dbbackup.py run app.db backups/db || \
/usr/bin/python3 /path/notify.py "DB BACKUP FAILED" critical
(The '||' fires the alert only when the backup exits non-zero —
backup_db raises/exits 1 on verify failure, so a broken dump pages you.)Non-negotiables: consistent dump, verify, compress, rotate, a working restore-test, cron line + failure alert.