PY-L7-39 · Database Backup Automation

Learning Goals

3 min

By the end of this lesson you can:

Explain why you can't just copy a live database file.
Back up SQLite safely with its online backup API.
Dump Postgres/MySQL via subprocess calling pg_dump/mysqldump.
Compress, rotate, and verify dumps so backups are small and trustworthy.

Warm-Up · Why Not Just Copy the File?

5 min

You might think "a SQLite database is one file — just shutil.copy it." But if the app is writing while you copy, you can capture a half-written transaction and get a corrupt backup that won't open. Server databases (Postgres, MySQL) keep data across many files plus memory — copying the directory is even worse.

Today's big idea

Databases need consistent backups — a snapshot that represents a valid moment, with no in-flight writes. Each database provides a proper tool for this: SQLite has an online backup API; Postgres has pg_dump; MySQL has mysqldump. Use the right tool, never a raw file copy. Then the file skills from Lesson 38 (compress, rotate, verify) apply to the resulting dump.

New Concept · Safe Dumps

14 min

SQLite: the online backup API

import sqlite3
from pathlib import Path

def backup_sqlite(db_path: str, dest: str) -> None:
    src = sqlite3.connect(db_path)
    dst = sqlite3.connect(dest)
    with dst:                          # transaction
        src.backup(dst)               # consistent copy, even while in use
    src.close(); dst.close()

backup_sqlite("app.db", "backups/app-backup.db")

connection.backup() copies a live SQLite database safely — it handles concurrent writes and gives you a consistent snapshot. Never shutil.copy a SQLite file that an app might be writing.

SQLite: a plain-text SQL dump

import sqlite3

def dump_sqlite_sql(db_path: str, out: str) -> None:
    con = sqlite3.connect(db_path)
    with open(out, "w", encoding="utf-8") as f:
        for line in con.iterdump():        # yields CREATE/INSERT statements
            f.write(line + "\n")
    con.close()

iterdump() produces the SQL to recreate the whole database — human-readable, diff-able, and restorable on any SQLite. Great for version control or migrating.

Postgres / MySQL via subprocess

import subprocess, os
from pathlib import Path

def dump_postgres(db: str, out: str) -> None:
    env = {**os.environ, "PGPASSWORD": os.environ["PG_PASSWORD"]}
    with open(out, "wb") as f:
        subprocess.run(
            ["pg_dump", "-h", os.environ["PG_HOST"],
             "-U", os.environ["PG_USER"], "-d", db],
            stdout=f, env=env, check=True, timeout=600)

# MySQL is the same shape:
#   subprocess.run(["mysqldump", "-h", host, "-u", user,
#                   f"-p{password}", db], stdout=f, check=True)

These ship with the database — call them with subprocess (Lesson 9), capturing stdout into the dump file.
Pass the password via the environment (PGPASSWORD), not on the command line where it'd show in process lists.
check=True turns a failed dump into an exception you can catch — a silent failed backup is the worst kind.

Compress the dump

import gzip, shutil

def gzip_file(path: str) -> str:
    out = path + ".gz"
    with open(path, "rb") as f_in, gzip.open(out, "wb") as f_out:
        shutil.copyfileobj(f_in, f_out)
    Path(path).unlink()                # remove the uncompressed original
    return out

SQL dumps are text and compress dramatically (often 10×). gzip is built in; for huge dumps, stream straight into a .gz to avoid a giant intermediate file.

⚠️ An untested backup is not a backup

The only proof a backup works is restoring it. Regularly restore your dump into a throwaway/test database and check the row counts. Plenty of teams discovered their backups were broken only when they needed them. Automate a periodic restore-test, not just the dump.

Worked Example · A Rotating DB Backup Job

12 min

Goal: a scheduled-ready job that dumps a SQLite database, compresses it, names it by timestamp, rotates old backups, and verifies the dump opens — the full safe-backup lifecycle.

import sqlite3, gzip, shutil, logging
from pathlib import Path
from datetime import datetime

logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s")
log = logging.getLogger("dbbackup")

def backup_db(db_path: str, backup_dir: str, keep: int = 7) -> Path:
    db = Path(db_path)
    out_dir = Path(backup_dir); out_dir.mkdir(parents=True, exist_ok=True)
    stamp = datetime.now().strftime("%Y%m%d-%H%M%S")
    raw = out_dir / f"{db.stem}-{stamp}.db"

    # 1) consistent copy via the backup API
    src = sqlite3.connect(db_path)
    dst = sqlite3.connect(str(raw))
    with dst:
        src.backup(dst)
    src.close(); dst.close()
    log.info("dumped → %s", raw.name)

    # 2) verify the dump actually opens and has tables
    check = sqlite3.connect(str(raw))
    tables = check.execute(
        "SELECT count(*) FROM sqlite_master WHERE type='table'").fetchone()[0]
    check.close()
    if tables == 0:
        raw.unlink()
        raise RuntimeError("backup has no tables — aborting")
    log.info("verified: %d tables", tables)

    # 3) compress
    gz = raw.with_suffix(".db.gz")
    with open(raw, "rb") as fi, gzip.open(gz, "wb") as fo:
        shutil.copyfileobj(fi, fo)
    raw.unlink()
    log.info("compressed → %s (%.1f KB)", gz.name, gz.stat().st_size / 1000)

    # 4) rotate: keep newest <keep>
    backups = sorted(out_dir.glob(f"{db.stem}-*.db.gz"), reverse=True)
    for old in backups[keep:]:
        old.unlink(); log.info("pruned %s", old.name)

    return gz

backup_db("app.db", "backups/db", keep=7)

INFO dumped → app-20260528-020000.db
INFO verified: 12 tables
INFO compressed → app-20260528-020000.db.gz (148.2 KB)
INFO pruned app-20260521-020000.db.gz

Read the code

The four stages are the full lifecycle of a trustworthy backup: a consistent dump via the backup API (not a file copy), an immediate verify that the dump opens and contains tables (catching corruption before you trust it), compression to save space, and rotation to bound disk — the pattern from Lesson 38, now for a database. Schedule this with cron (Lesson 36) and wrap it with the Slack notifier (Lesson 31) so you get pinged if a nightly backup ever fails.

Try It Yourself

13 min

SQLite needs no server — create a test DB with a couple of tables and rows to practise on.

01 🟢 Backup & reopen

Create a SQLite DB, insert some rows, back it up with the backup API, then open the backup and confirm the rows are there.

Hint

import sqlite3
con = sqlite3.connect("test.db")
con.execute("CREATE TABLE IF NOT EXISTS t(id INTEGER, name TEXT)")
con.execute("INSERT INTO t VALUES (1, 'Aisha')")
con.commit(); con.close()
backup_sqlite("test.db", "test-backup.db")
b = sqlite3.connect("test-backup.db")
print(b.execute("SELECT * FROM t").fetchall())

02 🟡 SQL dump + restore

Dump a DB to a .sql file with iterdump, then restore it into a brand-new database with executescript and confirm the data matches.

Hint

# restore:
sql = open("dump.sql", encoding="utf-8").read()
new = sqlite3.connect("restored.db")
new.executescript(sql)
print(new.execute("SELECT count(*) FROM t").fetchone())

03 🔴 Verify + rotate

Run the worked-example backup several times (insert a row between runs) and confirm: each backup verifies, compresses, and that only the newest N .db.gz files remain after rotation.

Mini-Challenge · The Restore Tester

8 min

Write test_restore(backup_gz) that takes a compressed SQLite backup, decompresses it to a temp file, opens it, and reports the table count and total row count across all tables — proving the backup is restorable, not just present. Clean up the temp file afterward.

Show a sample solution

import gzip, shutil, sqlite3, tempfile
from pathlib import Path

def test_restore(backup_gz: str) -> dict:
    tmp = Path(tempfile.mktemp(suffix=".db"))
    with gzip.open(backup_gz, "rb") as fi, open(tmp, "wb") as fo:
        shutil.copyfileobj(fi, fo)
    try:
        con = sqlite3.connect(str(tmp))
        tables = [r[0] for r in con.execute(
            "SELECT name FROM sqlite_master WHERE type='table'")]
        rows = sum(con.execute(f"SELECT count(*) FROM '{t}'").fetchone()[0]
                   for t in tables)
        con.close()
        return {"ok": True, "tables": len(tables), "rows": rows}
    except sqlite3.DatabaseError as e:
        return {"ok": False, "error": str(e)}
    finally:
        tmp.unlink(missing_ok=True)

print(test_restore("backups/db/app-20260528-020000.db.gz"))
# {'ok': True, 'tables': 12, 'rows': 8431}

Non-negotiables: decompress to temp, open it, count tables+rows, clean up, report ok/error.

Recap

3 min

Never raw-copy a live database — you risk corruption. Use the right tool for a consistent dump: SQLite's connection.backup() or iterdump(); pg_dump/mysqldump via subprocess for servers (password through the environment, check=True). Then treat the dump like any file: compress with gzip, name by timestamp, and rotate to keep the newest N. Critically, an untested backup isn't a backup — automate a periodic restore-test that opens the dump and checks table/row counts. Schedule the whole job (Lesson 36) and alert on failure (Lesson 31).

Vocabulary Card

consistent backup: A dump representing a valid moment, with no half-written transactions.
backup API / dump: A database's proper export tool (vs. copying its files).
pg_dump / mysqldump: Command-line tools that export Postgres/MySQL databases.
restore test: Restoring a backup to confirm it actually works.

Homework

4 min

Build dbbackup.py for a SQLite database: timestamped dump via the backup API, verify, compress, rotate (keep N), plus a restore-test subcommand that proves a chosen backup restores. Run it a few times against a test DB you mutate between runs. Document the exact cron line you'd use to run it nightly and how you'd alert on failure.

Sample · dbbackup.py operations

$ python dbbackup.py run app.db backups/db --keep 7
INFO dumped → app-20260528-020000.db
INFO verified: 12 tables
INFO compressed → app-20260528-020000.db.gz (148 KB)

$ python dbbackup.py restore-test backups/db/app-20260528-020000.db.gz
INFO restore OK: 12 tables, 8431 rows

Nightly cron (2am) + alert on failure:
  0 2 * * * /usr/bin/python3 /path/dbbackup.py run app.db backups/db || \
            /usr/bin/python3 /path/notify.py "DB BACKUP FAILED" critical

(The '||' fires the alert only when the backup exits non-zero —
backup_db raises/exits 1 on verify failure, so a broken dump pages you.)

Non-negotiables: consistent dump, verify, compress, rotate, a working restore-test, cron line + failure alert.

import sqlite3 from pathlib import Path def backup_sqlite(db_path: str, dest: str) -> None: src = sqlite3.connect(db_path) dst = sqlite3.connect(dest) with dst: # transaction src.backup(dst) # consistent copy, even while in use src.close(); dst.close() backup_sqlite("app.db", "backups/app-backup.db")

import sqlite3 def dump_sqlite_sql(db_path: str, out: str) -> None: con = sqlite3.connect(db_path) with open(out, "w", encoding="utf-8") as f: for line in con.iterdump(): # yields CREATE/INSERT statements f.write(line + "\n") con.close()

import subprocess, os from pathlib import Path def dump_postgres(db: str, out: str) -> None: env = {**os.environ, "PGPASSWORD": os.environ["PG_PASSWORD"]} with open(out, "wb") as f: subprocess.run( ["pg_dump", "-h", os.environ["PG_HOST"], "-U", os.environ["PG_USER"], "-d", db], stdout=f, env=env, check=True, timeout=600) # MySQL is the same shape: # subprocess.run(["mysqldump", "-h", host, "-u", user, # f"-p{password}", db], stdout=f, check=True)

import gzip, shutil def gzip_file(path: str) -> str: out = path + ".gz" with open(path, "rb") as f_in, gzip.open(out, "wb") as f_out: shutil.copyfileobj(f_in, f_out) Path(path).unlink() # remove the uncompressed original return out

import sqlite3, gzip, shutil, logging from pathlib import Path from datetime import datetime logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") log = logging.getLogger("dbbackup") def backup_db(db_path: str, backup_dir: str, keep: int = 7) -> Path: db = Path(db_path) out_dir = Path(backup_dir); out_dir.mkdir(parents=True, exist_ok=True) stamp = datetime.now().strftime("%Y%m%d-%H%M%S") raw = out_dir / f"{db.stem}-{stamp}.db" # 1) consistent copy via the backup API src = sqlite3.connect(db_path) dst = sqlite3.connect(str(raw)) with dst: src.backup(dst) src.close(); dst.close() log.info("dumped → %s", raw.name) # 2) verify the dump actually opens and has tables check = sqlite3.connect(str(raw)) tables = check.execute( "SELECT count(*) FROM sqlite_master WHERE type='table'").fetchone()[0] check.close() if tables == 0: raw.unlink() raise RuntimeError("backup has no tables — aborting") log.info("verified: %d tables", tables) # 3) compress gz = raw.with_suffix(".db.gz") with open(raw, "rb") as fi, gzip.open(gz, "wb") as fo: shutil.copyfileobj(fi, fo) raw.unlink() log.info("compressed → %s (%.1f KB)", gz.name, gz.stat().st_size / 1000) # 4) rotate: keep newest <keep> backups = sorted(out_dir.glob(f"{db.stem}-*.db.gz"), reverse=True) for old in backups[keep:]: old.unlink(); log.info("pruned %s", old.name) return gz backup_db("app.db", "backups/db", keep=7)

import sqlite3 con = sqlite3.connect("test.db") con.execute("CREATE TABLE IF NOT EXISTS t(id INTEGER, name TEXT)") con.execute("INSERT INTO t VALUES (1, 'Aisha')") con.commit(); con.close() backup_sqlite("test.db", "test-backup.db") b = sqlite3.connect("test-backup.db") print(b.execute("SELECT * FROM t").fetchall())

import gzip, shutil, sqlite3, tempfile from pathlib import Path def test_restore(backup_gz: str) -> dict: tmp = Path(tempfile.mktemp(suffix=".db")) with gzip.open(backup_gz, "rb") as fi, open(tmp, "wb") as fo: shutil.copyfileobj(fi, fo) try: con = sqlite3.connect(str(tmp)) tables = [r[0] for r in con.execute( "SELECT name FROM sqlite_master WHERE type='table'")] rows = sum(con.execute(f"SELECT count(*) FROM '{t}'").fetchone()[0] for t in tables) con.close() return {"ok": True, "tables": len(tables), "rows": rows} except sqlite3.DatabaseError as e: return {"ok": False, "error": str(e)} finally: tmp.unlink(missing_ok=True) print(test_restore("backups/db/app-20260528-020000.db.gz")) # {'ok': True, 'tables': 12, 'rows': 8431}