PY-L7-24 · API Automation: Retries & Rate Limits

Learning Goals

3 min

By the end of this lesson you can:

Tell transient errors (retry) from permanent ones (don't).
Implement exponential backoff with jitter.
Honour HTTP 429 and the Retry-After header.
Build a reusable resilient requests.Session for all your API calls.

Warm-Up · The Internet Is Unreliable

5 min

Over a long run, things will go wrong:

ConnectionError   wifi blipped for a second
500/502/503       server briefly overloaded
429               you're calling too fast — slow down
timeout           the response is taking too long
404 / 401         permanent: wrong URL / bad auth — retrying won't help

Today's big idea

Most failures are transient — try again in a moment and it works. The professional move is to retry automatically, but politely: wait longer after each failure (backoff), add randomness so many clients don't retry in lockstep (jitter), cap the attempts, and obey the server when it says "wait N seconds." Distinguish transient (retry) from permanent (give up) so you don't hammer a 404.

New Concept · Retrying Well

14 min

A basic GET with JSON

import requests

resp = requests.get("https://api.example.com/v1/users", timeout=10)
resp.raise_for_status()
data = resp.json()      # parse the JSON body into Python

Exponential backoff with jitter

import time, random, requests, logging

log = logging.getLogger("api")
RETRYABLE = {429, 500, 502, 503, 504}

def get_with_retry(url, *, max_attempts=5, timeout=10, **kwargs):
    for attempt in range(1, max_attempts + 1):
        try:
            resp = requests.get(url, timeout=timeout, **kwargs)
            if resp.status_code in RETRYABLE:
                raise requests.HTTPError(f"status {resp.status_code}",
                                         response=resp)
            resp.raise_for_status()        # raise on other 4xx (permanent)
            return resp
        except (requests.ConnectionError, requests.Timeout,
                requests.HTTPError) as e:
            permanent = (isinstance(e, requests.HTTPError)
                         and e.response is not None
                         and e.response.status_code not in RETRYABLE)
            if permanent or attempt == max_attempts:
                raise                       # give up
            wait = min(2 ** (attempt - 1), 30) + random.uniform(0, 1)
            log.warning("attempt %d failed (%s); retrying in %.1fs",
                        attempt, e, wait)
            time.sleep(wait)

Exponential: wait 1s, 2s, 4s, 8s… (2 ** (attempt-1)), capped at 30s.
Jitter: + random.uniform(0, 1) spreads retries so clients don't all hit at once.
Permanent vs transient: a 404/401 raises immediately; a 503/timeout retries.

Honour 429 and Retry-After

def respect_rate_limit(resp) -> float | None:
    if resp.status_code == 429:
        retry_after = resp.headers.get("Retry-After")
        if retry_after:
            return float(retry_after)      # server told us exactly how long
        return 60.0                        # sensible default
    return None

A 429 means "too many requests." The server often includes a Retry-After header with the exact seconds to wait — always prefer that over guessing. Honouring it keeps your API access from being throttled or banned.

Use a Session for everything

A requests.Session reuses the underlying connection (faster) and lets you set headers (like auth) once. requests even ships a battle-tested retry adapter (urllib3's Retry) you can mount on a session — great for production. Writing your own loop, as above, teaches the mechanics.

The library shortcut

from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def resilient_session() -> requests.Session:
    retry = Retry(total=5, backoff_factor=1,
                  status_forcelist=[429, 500, 502, 503, 504],
                  respect_retry_after_header=True)
    s = requests.Session()
    s.mount("https://", HTTPAdapter(max_retries=retry))
    return s

session = resilient_session()
data = session.get("https://api.example.com/v1/data", timeout=10).json()

This gives you backoff + 429 handling in a few lines, applied to every request through the session — the production default once you understand what it's doing.

Worked Example · Paginated API → CSV, Resiliently

12 min

Goal: pull every page of a paginated API into a CSV, surviving transient errors and respecting rate limits — the canonical "sync the data" automation.

import csv, time, logging
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s")
log = logging.getLogger("sync")

def session() -> requests.Session:
    retry = Retry(total=5, backoff_factor=1,
                  status_forcelist=[429, 500, 502, 503, 504],
                  respect_retry_after_header=True)
    s = requests.Session()
    s.mount("https://", HTTPAdapter(max_retries=retry))
    s.headers.update({"User-Agent": "DataSync/1.0"})
    return s

def fetch_all(base_url: str) -> list[dict]:
    s = session()
    results, page = [], 1
    while True:
        resp = s.get(base_url, params={"page": page, "per_page": 100},
                     timeout=10)
        resp.raise_for_status()
        body = resp.json()
        items = body.get("data", [])
        if not items:
            break                          # no more pages
        results.extend(items)
        log.info("page %d: +%d (total %d)", page, len(items), len(results))
        if not body.get("has_more"):       # API tells us when done
            break
        page += 1
        time.sleep(0.5)                    # gentle pacing on top of retries
    return results

def run(base_url: str, out: str) -> None:
    records = fetch_all(base_url)
    if not records:
        log.warning("no records returned")
        return
    fields = list(records[0].keys())
    with open(out, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fields, extrasaction="ignore")
        w.writeheader(); w.writerows(records)
    log.info("synced %d records → %s", len(records), out)

run("https://api.example.com/v1/orders", "orders.csv")

INFO page 1: +100 (total 100)
WARNING Retrying (Retry(total=4, ...)) after 1s: /v1/orders   ← transient blip, auto-recovered
INFO page 2: +100 (total 200)
INFO page 3: +37 (total 237)
INFO synced 237 records → orders.csv

Read the code

The resilient Session handles retries and 429s transparently — notice the warning line where a blip was auto-recovered mid-run without losing data. On top of that we add gentle sleep(0.5) pacing and use the API's own has_more flag to stop cleanly. The output is a CSV, so everything from Lesson 15 onward applies. This pattern — paginate, retry, rate-limit, write — is how you reliably sync data from any API, and it's a building block for the daily report bot in Lesson 40.

Try It Yourself

13 min

Use a free, no-auth practice API such as https://jsonplaceholder.typicode.com or https://httpbin.org (which can even simulate errors, e.g. /status/503).

01 🟢 Simple GET

Fetch jsonplaceholder.typicode.com/users, parse the JSON, and print each user's name and email. Use a timeout and raise_for_status.

02 🟡 Make it retry

Point your retry loop at httpbin.org/status/503 and watch it back off and eventually give up after max_attempts. Confirm the wait times grow each attempt.

Hint

# httpbin always returns 503 here, so retries exhaust and raise.
try:
    get_with_retry("https://httpbin.org/status/503", max_attempts=4)
except requests.HTTPError as e:
    print("gave up:", e)

03 🔴 Permanent vs transient

Confirm your loop retries httpbin.org/status/503 but immediately gives up on httpbin.org/status/404 — proving you distinguish transient from permanent errors.

Hint

# 404 should raise on the FIRST attempt (no retries),
# 503 should retry up to max_attempts.
for code in (404, 503):
    try:
        get_with_retry(f"https://httpbin.org/status/{code}",
                       max_attempts=3)
    except requests.HTTPError as e:
        print(code, "→", e)

Mini-Challenge · The Circuit Breaker

8 min

Add a simple circuit breaker: if an endpoint fails N times in a row, stop calling it for a cool-down period and return a cached/last-known value instead. This protects both your script and a struggling server from a retry storm.

Show a sample solution

import time, requests

class CircuitBreaker:
    def __init__(self, threshold=3, cooldown=30):
        self.threshold, self.cooldown = threshold, cooldown
        self.failures = 0
        self.open_until = 0.0

    def call(self, fn, *args, **kwargs):
        if time.time() < self.open_until:
            raise RuntimeError("circuit open — skipping call")
        try:
            result = fn(*args, **kwargs)
            self.failures = 0               # success resets
            return result
        except requests.RequestException:
            self.failures += 1
            if self.failures >= self.threshold:
                self.open_until = time.time() + self.cooldown
                print(f"circuit OPEN for {self.cooldown}s")
            raise

cb = CircuitBreaker(threshold=3, cooldown=30)
# cb.call(requests.get, "https://api.example.com/v1/data", timeout=5)

Non-negotiables: trips open after N failures, refuses calls during cool-down, resets on success.

Recap

3 min

Real API automation must survive transient failures. Retry transient errors (connection drops, timeouts, 5xx, 429) but give up immediately on permanent ones (404, 401). Use exponential backoff (1s, 2s, 4s…) with jitter and a cap, and honour 429's Retry-After header. A requests.Session with the urllib3 Retry adapter gives you all of this in a few lines for every call. Add gentle pacing, paginate to the end, and write results to CSV/JSON. These habits turn a fragile demo into a pipeline you can leave running unattended.

Vocabulary Card

transient error: A temporary failure that may succeed on retry (timeout, 503).
exponential backoff: Waiting longer after each successive failure.
jitter: Added randomness to retry delays so clients don't sync up.
Retry-After: A header telling you exactly how long to wait after a 429.

Homework

4 min

Build apifetch.py that fetches all items from a paginated practice API (or simulates pagination over jsonplaceholder) into a CSV, using a resilient session with retries and rate-limit handling, logging each page and any recovered failure. Add a --max-pages safety cap and a --out filename. Test it against httpbin's error endpoints to prove the retries work.

Sample · apifetch.py (core)

import argparse, csv, logging, requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s")
log = logging.getLogger("fetch")

def session():
    retry = Retry(total=5, backoff_factor=1,
                  status_forcelist=[429, 500, 502, 503, 504],
                  respect_retry_after_header=True)
    s = requests.Session()
    s.mount("https://", HTTPAdapter(max_retries=retry))
    return s

p = argparse.ArgumentParser(description="Resilient API → CSV")
p.add_argument("url")
p.add_argument("--max-pages", type=int, default=20)
p.add_argument("--out", default="api.csv")
a = p.parse_args()

s = session(); rows = []
for page in range(1, a.max_pages + 1):
    r = s.get(a.url, params={"_page": page, "_limit": 100}, timeout=10)
    r.raise_for_status()
    items = r.json()
    if not items:
        break
    rows += items
    log.info("page %d: total %d", page, len(rows))

if rows:
    fields = list(rows[0].keys())
    with open(a.out, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fields, extrasaction="ignore")
        w.writeheader(); w.writerows(rows)
    log.info("wrote %d rows → %s", len(rows), a.out)

Non-negotiables: resilient session, pagination with cap, logging, CSV output, retries demonstrably working.

ConnectionError wifi blipped for a second 500/502/503 server briefly overloaded 429 you're calling too fast — slow down timeout the response is taking too long 404 / 401 permanent: wrong URL / bad auth — retrying won't help

import time, random, requests, logging log = logging.getLogger("api") RETRYABLE = {429, 500, 502, 503, 504} def get_with_retry(url, *, max_attempts=5, timeout=10, **kwargs): for attempt in range(1, max_attempts + 1): try: resp = requests.get(url, timeout=timeout, **kwargs) if resp.status_code in RETRYABLE: raise requests.HTTPError(f"status {resp.status_code}", response=resp) resp.raise_for_status() # raise on other 4xx (permanent) return resp except (requests.ConnectionError, requests.Timeout, requests.HTTPError) as e: permanent = (isinstance(e, requests.HTTPError) and e.response is not None and e.response.status_code not in RETRYABLE) if permanent or attempt == max_attempts: raise # give up wait = min(2 ** (attempt - 1), 30) + random.uniform(0, 1) log.warning("attempt %d failed (%s); retrying in %.1fs", attempt, e, wait) time.sleep(wait)

def respect_rate_limit(resp) -> float | None: if resp.status_code == 429: retry_after = resp.headers.get("Retry-After") if retry_after: return float(retry_after) # server told us exactly how long return 60.0 # sensible default return None

from requests.adapters import HTTPAdapter from urllib3.util.retry import Retry def resilient_session() -> requests.Session: retry = Retry(total=5, backoff_factor=1, status_forcelist=[429, 500, 502, 503, 504], respect_retry_after_header=True) s = requests.Session() s.mount("https://", HTTPAdapter(max_retries=retry)) return s session = resilient_session() data = session.get("https://api.example.com/v1/data", timeout=10).json()

import csv, time, logging import requests from requests.adapters import HTTPAdapter from urllib3.util.retry import Retry logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") log = logging.getLogger("sync") def session() -> requests.Session: retry = Retry(total=5, backoff_factor=1, status_forcelist=[429, 500, 502, 503, 504], respect_retry_after_header=True) s = requests.Session() s.mount("https://", HTTPAdapter(max_retries=retry)) s.headers.update({"User-Agent": "DataSync/1.0"}) return s def fetch_all(base_url: str) -> list[dict]: s = session() results, page = [], 1 while True: resp = s.get(base_url, params={"page": page, "per_page": 100}, timeout=10) resp.raise_for_status() body = resp.json() items = body.get("data", []) if not items: break # no more pages results.extend(items) log.info("page %d: +%d (total %d)", page, len(items), len(results)) if not body.get("has_more"): # API tells us when done break page += 1 time.sleep(0.5) # gentle pacing on top of retries return results def run(base_url: str, out: str) -> None: records = fetch_all(base_url) if not records: log.warning("no records returned") return fields = list(records[0].keys()) with open(out, "w", newline="", encoding="utf-8") as f: w = csv.DictWriter(f, fieldnames=fields, extrasaction="ignore") w.writeheader(); w.writerows(records) log.info("synced %d records → %s", len(records), out) run("https://api.example.com/v1/orders", "orders.csv")

INFO page 1: +100 (total 100) WARNING Retrying (Retry(total=4, ...)) after 1s: /v1/orders ← transient blip, auto-recovered INFO page 2: +100 (total 200) INFO page 3: +37 (total 237) INFO synced 237 records → orders.csv

# 404 should raise on the FIRST attempt (no retries), # 503 should retry up to max_attempts. for code in (404, 503): try: get_with_retry(f"https://httpbin.org/status/{code}", max_attempts=3) except requests.HTTPError as e: print(code, "→", e)

import time, requests class CircuitBreaker: def __init__(self, threshold=3, cooldown=30): self.threshold, self.cooldown = threshold, cooldown self.failures = 0 self.open_until = 0.0 def call(self, fn, *args, **kwargs): if time.time() < self.open_until: raise RuntimeError("circuit open — skipping call") try: result = fn(*args, **kwargs) self.failures = 0 # success resets return result except requests.RequestException: self.failures += 1 if self.failures >= self.threshold: self.open_until = time.time() + self.cooldown print(f"circuit OPEN for {self.cooldown}s") raise cb = CircuitBreaker(threshold=3, cooldown=30) # cb.call(requests.get, "https://api.example.com/v1/data", timeout=5)