Learning Goals
3 minBy the end of this lesson you can:
- Build a new PDF page-by-page with
PdfWriter. - Merge multiple PDFs and split one into ranges.
- Rotate pages and delete unwanted ones.
- Stamp a watermark/footer onto every page by overlaying one PDF on another.
Warm-Up · Pages Are Lego Bricks
5 minThe mental model for editing PDFs: a reader hands you page objects; a writer collects pages and saves them as a new file. You build the output by picking pages from one or more sources, in any order.
You don't edit a PDF in place — you assemble a new one. PdfWriter.add_page(page) is the brick; merging, splitting, and reordering are all just choosing which bricks go in and in what order. Watermarking is one extra trick: overlaying a stamp page on top of each content page with merge_page.
New Concept · Writing & Combining Pages
14 minMerging PDFs
from pypdf import PdfReader, PdfWriter from pathlib import Path def merge(paths: list[str], out: str) -> None: writer = PdfWriter() for path in paths: reader = PdfReader(path) for page in reader.pages: writer.add_page(page) # collect every page in order with open(out, "wb") as f: # note "wb" — PDFs are binary writer.write(f) merge(["cover.pdf", "body.pdf", "appendix.pdf"], "combined.pdf")
Add pages in the order you want them; write the writer to a binary file ("wb"). That's the whole merge.
Splitting by page range
def split(path: str, start: int, end: int, out: str) -> None: reader = PdfReader(path) writer = PdfWriter() for i in range(start, end): # 0-based, end exclusive writer.add_page(reader.pages[i]) with open(out, "wb") as f: writer.write(f) split("report.pdf", 0, 5, "chapter1.pdf") # pages 1-5 (human numbering)
Pages are 0-based here, so "pages 1-5" for a human is range(0, 5). Select whatever subset you like.
Rotating and deleting
writer = PdfWriter() for i, page in enumerate(reader.pages): if i == 2: continue # delete page 3 by skipping it if i == 0: page.rotate(90) # rotate cover 90° clockwise writer.add_page(page)
Deleting = simply not adding a page. Rotating = call page.rotate(90) (multiples of 90) before adding it.
Watermarking by overlay
# stamp.pdf is a single page: e.g. faint "DRAFT" or a footer stamp = PdfReader("stamp.pdf").pages[0] writer = PdfWriter() for page in PdfReader("report.pdf").pages: page.merge_page(stamp) # overlay stamp ON TOP of the page writer.add_page(page) with open("stamped.pdf", "wb") as f: writer.write(f)
merge_page(stamp) draws the stamp page over the content page — transparent areas show the original through. That single line watermarks an entire document.
pypdf doesn't draw text — it moves pages around. To create a watermark/footer PDF from scratch, use reportlab (pip install reportlab): draw your text once onto a one-page PDF, then overlay it with merge_page. Generate the stamp once, reuse it forever.
Worked Example · Report Assembler with Footer
12 minGoal: merge a cover, body, and appendix into one report, then stamp a "Confidential — page footer" on every page using a reportlab-generated stamp. A complete document-assembly pipeline.
from pypdf import PdfReader, PdfWriter from reportlab.pdfgen import canvas from reportlab.lib.pagesizes import A4 from pathlib import Path import logging logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") log = logging.getLogger("pdf") def make_stamp(text: str, path: str) -> str: c = canvas.Canvas(path, pagesize=A4) c.setFont("Helvetica", 8) c.setFillGray(0.5) c.drawString(40, 20, text) # footer near bottom-left c.save() return path def assemble(parts: list[str], stamp_text: str, out: str) -> None: stamp = PdfReader(make_stamp(stamp_text, "_stamp.pdf")).pages[0] writer = PdfWriter() total = 0 for part in parts: reader = PdfReader(part) for page in reader.pages: page.merge_page(stamp) # footer on every page writer.add_page(page) total += 1 log.info("added %s (%d pages)", Path(part).name, len(reader.pages)) with open(out, "wb") as f: writer.write(f) Path("_stamp.pdf").unlink(missing_ok=True) # clean up temp stamp log.info("wrote %s — %d pages", out, total) assemble(["cover.pdf", "body.pdf", "appendix.pdf"], "Confidential — Acme Corp 2026", "final_report.pdf")
INFO added cover.pdf (1 pages) INFO added body.pdf (18 pages) INFO added appendix.pdf (4 pages) INFO wrote final_report.pdf — 23 pages
Read the code
Two tools combine cleanly: reportlab draws the footer once onto a throwaway one-page PDF, and pypdf overlays it on every content page while merging the parts. The temp stamp is cleaned up with unlink(missing_ok=True) (a tidy pathlib touch). The result is a single, branded, page-stamped report assembled from raw parts — the kind of task people pay for, done in 30 lines. Lesson 22 wraps this into a full multi-format report generator.
Try It Yourself
13 minMerge two PDFs into one and confirm the page count of the output equals the sum of the inputs.
Write burst(path) that splits a PDF into one file per page (doc-p01.pdf, doc-p02.pdf, …). Zero-pad the numbers so they sort correctly.
Hint
from pypdf import PdfReader, PdfWriter def burst(path): reader = PdfReader(path) for i, page in enumerate(reader.pages, start=1): w = PdfWriter(); w.add_page(page) with open(f"doc-p{i:02}.pdf", "wb") as f: w.write(f)
Write a script that produces a copy of a PDF with its pages in reverse order. Useful for documents scanned back-to-front.
Hint
from pypdf import PdfReader, PdfWriter reader = PdfReader("scan.pdf") writer = PdfWriter() for page in reversed(reader.pages): writer.add_page(page) with open("reversed.pdf", "wb") as f: writer.write(f)
Mini-Challenge · The Chapter Splitter
8 minWrite split_chapters(path, breaks, names) that cuts a PDF into named chapters given a list of starting page numbers. E.g. breaks=[1, 6, 20] with names=["intro", "body", "appendix"] produces three PDFs covering pages 1-5, 6-19, and 20-end.
Show a sample solution
from pypdf import PdfReader, PdfWriter def split_chapters(path, breaks, names): reader = PdfReader(path) n = len(reader.pages) bounds = [b - 1 for b in breaks] + [n] # to 0-based, add end for i, name in enumerate(names): start, end = bounds[i], bounds[i + 1] writer = PdfWriter() for p in range(start, end): writer.add_page(reader.pages[p]) with open(f"{name}.pdf", "wb") as f: writer.write(f) print(f"{name}.pdf: pages {start+1}-{end}") split_chapters("book.pdf", [1, 6, 20], ["intro", "body", "appendix"])
Non-negotiables: page numbers → ranges, last chapter runs to the end, one file per name.
Recap
3 minEditing PDFs means assembling a new one: a PdfReader gives pages, a PdfWriter collects them, and you write to a binary ("wb") file. Merge by adding pages from several files in order; split by adding a 0-based range; delete by skipping; rotate with page.rotate(90). Watermark by overlaying a stamp page with page.merge_page(stamp) — and generate that stamp with reportlab, since pypdf moves pages but doesn't draw text. These page-level operations, plus Lesson 20's text extraction, cover the vast majority of real PDF automation.
Vocabulary Card
- PdfWriter
- Collects pages and writes them out as a new PDF.
- add_page
- Appends a page to the writer — the building block of merge/split.
- merge_page
- Overlays one page's content on another (used for watermarks).
- reportlab
- A library that draws PDFs from scratch — for making stamps/footers.
Homework
4 minBuild pdftool.py with argparse subcommands: merge <out> <files...>, split <file> <start> <end> <out>, and stamp <file> <text> <out> (which generates a footer with reportlab and overlays it on every page with page numbers like "Page 3 of 23"). Log each operation.
Sample · pdftool.py (core)
import argparse from pypdf import PdfReader, PdfWriter def cmd_merge(a): w = PdfWriter() for path in a.files: for page in PdfReader(path).pages: w.add_page(page) with open(a.out, "wb") as f: w.write(f) print("merged", len(a.files), "files →", a.out) def cmd_split(a): r = PdfReader(a.file); w = PdfWriter() for i in range(a.start - 1, a.end): w.add_page(r.pages[i]) with open(a.out, "wb") as f: w.write(f) print(f"pages {a.start}-{a.end} → {a.out}") # stamp uses reportlab to draw "Page X of N" per page, # then merge_page onto each content page (see worked example). p = argparse.ArgumentParser(); sub = p.add_subparsers(dest="cmd", required=True) m = sub.add_parser("merge"); m.add_argument("out"); m.add_argument("files", nargs="+") m.set_defaults(func=cmd_merge) s = sub.add_parser("split"); s.add_argument("file") s.add_argument("start", type=int); s.add_argument("end", type=int) s.add_argument("out"); s.set_defaults(func=cmd_split) args = p.parse_args(); args.func(args)
Non-negotiables: three subcommands, binary writes, reportlab page-number footer, logging.