Learning Goals
3 min- Compare mock vs real-DB testing on speed and realism.
- Keep real-DB tests fast: in-memory, fixtures, transaction rollback.
- Choose the right approach per situation.
- Catch SQL-level bugs a mock would miss.
Warm-Up · The Trade-Off Table
5 minMOCK connection REAL test DB speed fastest fast (in-memory) → slow (server) proves SQL works? NO YES catches bad SQL no yes (typos, constraints, joins) catches "did we yes harder commit?" setup effort low medium
For SQLite, the real in-memory DB is both fast AND realistic — use it as the default. Mock the connection only to assert behaviour (was commit called?) or when the real DB is genuinely heavy (a remote Postgres). Realism usually wins.
New Concept · Fast, Realistic DB Tests
14 minDefault: in-memory SQLite, fresh per test
import sqlite3, pytest @pytest.fixture def db(): con = sqlite3.connect(":memory:") con.executescript(open("schema.sql").read()) # real schema yield con con.close()
Speed trick: build the schema once, roll back per test
If schema setup is expensive, create it once (session scope) and wrap each test in a transaction that's rolled back — so every test sees a clean DB without rebuilding it.
@pytest.fixture(scope="session") def base_db(): con = sqlite3.connect(":memory:") con.executescript(SCHEMA) return con @pytest.fixture def db(base_db): base_db.execute("BEGIN") yield base_db base_db.rollback() # undo everything this test did → clean for next
Rollback is the professional pattern for real databases (especially Postgres): the schema persists, but each test's data changes vanish, keeping tests both fast and isolated.
When to mock instead
Mock the connection when: - the real DB is a remote server too slow for unit tests - you're testing "did the code COMMIT / RETRY / handle an error?" - you can't run the real DB in CI Otherwise prefer the real (in-memory) DB — it catches SQL bugs.
Bugs only a real DB catches
# a UNIQUE constraint, a typo'd column, a bad JOIN, a NOT NULL — # a mock happily accepts all of these; the real DB rejects them: def test_duplicate_email_rejected(db): add_user(db, "a@x.com") with pytest.raises(sqlite3.IntegrityError): add_user(db, "a@x.com") # the real UNIQUE constraint fires
Worked Example · Both Approaches, Side by Side
12 min# repo.py class OrderRepo: def __init__(self, conn): self.conn = conn def place(self, item, qty): if qty <= 0: raise ValueError("qty must be positive") self.conn.execute("INSERT INTO orders (item, qty) VALUES (?,?)", (item, qty)) self.conn.commit() def total_qty(self): return self.conn.execute("SELECT COALESCE(SUM(qty),0) FROM orders").fetchone()[0]
# test_repo.py import sqlite3, pytest from unittest.mock import Mock from repo import OrderRepo @pytest.fixture def db(): con = sqlite3.connect(":memory:") con.execute("CREATE TABLE orders (item TEXT, qty INTEGER NOT NULL)") yield con; con.close() # REAL DB — proves the SQL + aggregation work def test_place_and_total(db): repo = OrderRepo(db) repo.place("roti", 3); repo.place("milo", 2) assert repo.total_qty() == 5 def test_empty_total(db): assert OrderRepo(db).total_qty() == 0 # COALESCE handles empty ✓ def test_bad_qty_rejected(db): with pytest.raises(ValueError): OrderRepo(db).place("x", 0) # MOCK — proves behaviour (we commit) without a DB def test_place_commits(): conn = Mock() OrderRepo(conn).place("roti", 1) conn.commit.assert_called_once()
$ pytest test_repo.py -v test_place_and_total PASSED # real SQL + SUM aggregation verified test_empty_total PASSED # COALESCE path verified test_bad_qty_rejected PASSED test_place_commits PASSED # commit behaviour verified via mock 4 passed in 0.02s
Read the diff
Most tests use the real in-memory DB — they prove the INSERT, the SUM, and the COALESCE-empty path actually work in SQL. One mock test verifies commit() is called (pure behaviour). The real-DB tests would catch a typo'd column or a broken aggregation; the mock never would. Default to real; reach for mock only for behaviour or heavy DBs.
Try It Yourself
13 minTest a repo's create/read with an in-memory DB. Confirm a SUM/COUNT aggregation gives the right answer.
Add a UNIQUE or NOT NULL constraint to your schema and write a test proving the real DB rejects a violation. Confirm a mock would not catch it.
Implement the session-scoped-schema + per-test-rollback fixture pattern. Confirm tests are isolated AND the schema is built only once.
Mini-Challenge · The SQL-Bug Demo
8 minWrite a query with a deliberate bug (wrong column name, or SUM on the wrong field). Show the real-DB test fails (catching it), then show a mock-connection test of the same method PASSES (missing it). This is the definitive argument for real-DB testing where it's feasible.
Recap
3 minReal DB vs mock is speed vs realism. For SQLite, in-memory gives you both — make it the default; it catches SQL typos, constraints, and aggregation bugs a mock never would. Use session-schema + per-test rollback to keep real-DB tests fast and isolated. Mock the connection only for behaviour checks or genuinely heavy databases. Next: testing AI/ML code, where "correct" is fuzzy.
Vocabulary Card
- test database
- A real DB used only for tests, reset between runs.
- transaction rollback
- Undoing a test's DB changes so the next test starts clean.
- integrity constraint
- A DB rule (UNIQUE, NOT NULL) that only a real DB enforces.
- realism vs speed
- The core DB-testing trade-off; in-memory SQLite gives much of both.
Homework
4 minFor a repo with a constraint and an aggregation, write a real-DB test suite (in-memory, with rollback isolation) plus one mock test for a commit. Then do the SQL-bug demo: prove the real test catches a query bug the mock misses. Submit both and a one-line conclusion on your default choice.
Model on test_repo.py. The conclusion most students reach: "default to a real in-memory DB; mock only for behaviour or remote DBs."