Learning Goals
3 minBy the end of this lesson you can:
- Tell transient errors (retry) from permanent ones (don't).
- Implement exponential backoff with jitter.
- Honour HTTP 429 and the
Retry-Afterheader. - Build a reusable resilient
requests.Sessionfor all your API calls.
Warm-Up · The Internet Is Unreliable
5 minOver a long run, things will go wrong:
ConnectionError wifi blipped for a second 500/502/503 server briefly overloaded 429 you're calling too fast — slow down timeout the response is taking too long 404 / 401 permanent: wrong URL / bad auth — retrying won't help
Most failures are transient — try again in a moment and it works. The professional move is to retry automatically, but politely: wait longer after each failure (backoff), add randomness so many clients don't retry in lockstep (jitter), cap the attempts, and obey the server when it says "wait N seconds." Distinguish transient (retry) from permanent (give up) so you don't hammer a 404.
New Concept · Retrying Well
14 minA basic GET with JSON
import requests resp = requests.get("https://api.example.com/v1/users", timeout=10) resp.raise_for_status() data = resp.json() # parse the JSON body into Python
Exponential backoff with jitter
import time, random, requests, logging log = logging.getLogger("api") RETRYABLE = {429, 500, 502, 503, 504} def get_with_retry(url, *, max_attempts=5, timeout=10, **kwargs): for attempt in range(1, max_attempts + 1): try: resp = requests.get(url, timeout=timeout, **kwargs) if resp.status_code in RETRYABLE: raise requests.HTTPError(f"status {resp.status_code}", response=resp) resp.raise_for_status() # raise on other 4xx (permanent) return resp except (requests.ConnectionError, requests.Timeout, requests.HTTPError) as e: permanent = (isinstance(e, requests.HTTPError) and e.response is not None and e.response.status_code not in RETRYABLE) if permanent or attempt == max_attempts: raise # give up wait = min(2 ** (attempt - 1), 30) + random.uniform(0, 1) log.warning("attempt %d failed (%s); retrying in %.1fs", attempt, e, wait) time.sleep(wait)
- Exponential: wait 1s, 2s, 4s, 8s… (
2 ** (attempt-1)), capped at 30s. - Jitter:
+ random.uniform(0, 1)spreads retries so clients don't all hit at once. - Permanent vs transient: a 404/401 raises immediately; a 503/timeout retries.
Honour 429 and Retry-After
def respect_rate_limit(resp) -> float | None: if resp.status_code == 429: retry_after = resp.headers.get("Retry-After") if retry_after: return float(retry_after) # server told us exactly how long return 60.0 # sensible default return None
A 429 means "too many requests." The server often includes a Retry-After header with the exact seconds to wait — always prefer that over guessing. Honouring it keeps your API access from being throttled or banned.
A requests.Session reuses the underlying connection (faster) and lets you set headers (like auth) once. requests even ships a battle-tested retry adapter (urllib3's Retry) you can mount on a session — great for production. Writing your own loop, as above, teaches the mechanics.
The library shortcut
from requests.adapters import HTTPAdapter from urllib3.util.retry import Retry def resilient_session() -> requests.Session: retry = Retry(total=5, backoff_factor=1, status_forcelist=[429, 500, 502, 503, 504], respect_retry_after_header=True) s = requests.Session() s.mount("https://", HTTPAdapter(max_retries=retry)) return s session = resilient_session() data = session.get("https://api.example.com/v1/data", timeout=10).json()
This gives you backoff + 429 handling in a few lines, applied to every request through the session — the production default once you understand what it's doing.
Worked Example · Paginated API → CSV, Resiliently
12 minGoal: pull every page of a paginated API into a CSV, surviving transient errors and respecting rate limits — the canonical "sync the data" automation.
import csv, time, logging import requests from requests.adapters import HTTPAdapter from urllib3.util.retry import Retry logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") log = logging.getLogger("sync") def session() -> requests.Session: retry = Retry(total=5, backoff_factor=1, status_forcelist=[429, 500, 502, 503, 504], respect_retry_after_header=True) s = requests.Session() s.mount("https://", HTTPAdapter(max_retries=retry)) s.headers.update({"User-Agent": "DataSync/1.0"}) return s def fetch_all(base_url: str) -> list[dict]: s = session() results, page = [], 1 while True: resp = s.get(base_url, params={"page": page, "per_page": 100}, timeout=10) resp.raise_for_status() body = resp.json() items = body.get("data", []) if not items: break # no more pages results.extend(items) log.info("page %d: +%d (total %d)", page, len(items), len(results)) if not body.get("has_more"): # API tells us when done break page += 1 time.sleep(0.5) # gentle pacing on top of retries return results def run(base_url: str, out: str) -> None: records = fetch_all(base_url) if not records: log.warning("no records returned") return fields = list(records[0].keys()) with open(out, "w", newline="", encoding="utf-8") as f: w = csv.DictWriter(f, fieldnames=fields, extrasaction="ignore") w.writeheader(); w.writerows(records) log.info("synced %d records → %s", len(records), out) run("https://api.example.com/v1/orders", "orders.csv")
INFO page 1: +100 (total 100) WARNING Retrying (Retry(total=4, ...)) after 1s: /v1/orders ← transient blip, auto-recovered INFO page 2: +100 (total 200) INFO page 3: +37 (total 237) INFO synced 237 records → orders.csv
Read the code
The resilient Session handles retries and 429s transparently — notice the warning line where a blip was auto-recovered mid-run without losing data. On top of that we add gentle sleep(0.5) pacing and use the API's own has_more flag to stop cleanly. The output is a CSV, so everything from Lesson 15 onward applies. This pattern — paginate, retry, rate-limit, write — is how you reliably sync data from any API, and it's a building block for the daily report bot in Lesson 40.
Try It Yourself
13 minUse a free, no-auth practice API such as https://jsonplaceholder.typicode.com or https://httpbin.org (which can even simulate errors, e.g. /status/503).
Fetch jsonplaceholder.typicode.com/users, parse the JSON, and print each user's name and email. Use a timeout and raise_for_status.
Point your retry loop at httpbin.org/status/503 and watch it back off and eventually give up after max_attempts. Confirm the wait times grow each attempt.
Hint
# httpbin always returns 503 here, so retries exhaust and raise. try: get_with_retry("https://httpbin.org/status/503", max_attempts=4) except requests.HTTPError as e: print("gave up:", e)
Confirm your loop retries httpbin.org/status/503 but immediately gives up on httpbin.org/status/404 — proving you distinguish transient from permanent errors.
Hint
# 404 should raise on the FIRST attempt (no retries), # 503 should retry up to max_attempts. for code in (404, 503): try: get_with_retry(f"https://httpbin.org/status/{code}", max_attempts=3) except requests.HTTPError as e: print(code, "→", e)
Mini-Challenge · The Circuit Breaker
8 minAdd a simple circuit breaker: if an endpoint fails N times in a row, stop calling it for a cool-down period and return a cached/last-known value instead. This protects both your script and a struggling server from a retry storm.
Show a sample solution
import time, requests class CircuitBreaker: def __init__(self, threshold=3, cooldown=30): self.threshold, self.cooldown = threshold, cooldown self.failures = 0 self.open_until = 0.0 def call(self, fn, *args, **kwargs): if time.time() < self.open_until: raise RuntimeError("circuit open — skipping call") try: result = fn(*args, **kwargs) self.failures = 0 # success resets return result except requests.RequestException: self.failures += 1 if self.failures >= self.threshold: self.open_until = time.time() + self.cooldown print(f"circuit OPEN for {self.cooldown}s") raise cb = CircuitBreaker(threshold=3, cooldown=30) # cb.call(requests.get, "https://api.example.com/v1/data", timeout=5)
Non-negotiables: trips open after N failures, refuses calls during cool-down, resets on success.
Recap
3 minReal API automation must survive transient failures. Retry transient errors (connection drops, timeouts, 5xx, 429) but give up immediately on permanent ones (404, 401). Use exponential backoff (1s, 2s, 4s…) with jitter and a cap, and honour 429's Retry-After header. A requests.Session with the urllib3 Retry adapter gives you all of this in a few lines for every call. Add gentle pacing, paginate to the end, and write results to CSV/JSON. These habits turn a fragile demo into a pipeline you can leave running unattended.
Vocabulary Card
- transient error
- A temporary failure that may succeed on retry (timeout, 503).
- exponential backoff
- Waiting longer after each successive failure.
- jitter
- Added randomness to retry delays so clients don't sync up.
- Retry-After
- A header telling you exactly how long to wait after a 429.
Homework
4 minBuild apifetch.py that fetches all items from a paginated practice API (or simulates pagination over jsonplaceholder) into a CSV, using a resilient session with retries and rate-limit handling, logging each page and any recovered failure. Add a --max-pages safety cap and a --out filename. Test it against httpbin's error endpoints to prove the retries work.
Sample · apifetch.py (core)
import argparse, csv, logging, requests from requests.adapters import HTTPAdapter from urllib3.util.retry import Retry logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") log = logging.getLogger("fetch") def session(): retry = Retry(total=5, backoff_factor=1, status_forcelist=[429, 500, 502, 503, 504], respect_retry_after_header=True) s = requests.Session() s.mount("https://", HTTPAdapter(max_retries=retry)) return s p = argparse.ArgumentParser(description="Resilient API → CSV") p.add_argument("url") p.add_argument("--max-pages", type=int, default=20) p.add_argument("--out", default="api.csv") a = p.parse_args() s = session(); rows = [] for page in range(1, a.max_pages + 1): r = s.get(a.url, params={"_page": page, "_limit": 100}, timeout=10) r.raise_for_status() items = r.json() if not items: break rows += items log.info("page %d: total %d", page, len(rows)) if rows: fields = list(rows[0].keys()) with open(a.out, "w", newline="", encoding="utf-8") as f: w = csv.DictWriter(f, fieldnames=fields, extrasaction="ignore") w.writeheader(); w.writerows(rows) log.info("wrote %d rows → %s", len(rows), a.out)
Non-negotiables: resilient session, pagination with cap, logging, CSV output, retries demonstrably working.