Learning Goals
3 min- Explain RAG: retrieve relevant text, then generate grounded in it.
- Split documents into chunks.
- Retrieve the best chunks (TF-IDF now; embeddings as a stretch).
- Build a prompt that says "answer ONLY from this context".
Warm-Up · Open-Book Exam
5 minClosed-book (plain LLM): answers from memory → can hallucinate,
doesn't know your private notes.
Open-book (RAG): 1. find the relevant pages in YOUR notes
2. give them to the model
3. "answer using only these pages"RAG turns a closed-book exam into an open-book one. You retrieve the relevant text and paste it into the prompt, then instruct the model to answer only from it. This grounds answers in your real documents and slashes hallucination.
New Concept · Chunk, Retrieve, Generate
14 min1. Chunk your documents
def chunk(text, size=500, overlap=50): chunks, i = [], 0 while i < len(text): chunks.append(text[i:i + size]) i += size - overlap # overlap keeps sentences from splitting return chunks
Chunks must be small enough to fit several in a prompt, big enough to be meaningful. Overlap avoids cutting an answer in half at a boundary.
2. Retrieve with TF-IDF (no API needed)
from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity def build_index(chunks): vec = TfidfVectorizer(stop_words="english") matrix = vec.fit_transform(chunks) return vec, matrix def retrieve(query, chunks, vec, matrix, k=3): q = vec.transform([query]) sims = cosine_similarity(q, matrix)[0] top = sims.argsort()[-k:][::-1] return [chunks[i] for i in top]
TF-IDF + cosine similarity finds the chunks whose words best match the question — a cheap, no-API retriever. (Real systems use neural embeddings; see the stretch.)
3. Generate grounded in the context
import anthropic client = anthropic.Anthropic() def answer(query, context_chunks): context = "\n\n---\n\n".join(context_chunks) msg = client.messages.create( model="claude-haiku-4-5", max_tokens=400, system="Answer ONLY using the provided context. " "If the answer isn't there, say 'I don't know from the notes.'", messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}], ) return msg.content[0].text
The honesty rule
The system prompt's "if it's not in the context, say you don't know" is what stops the model inventing answers. RAG without that instruction still hallucinates.
Worked Example · Q&A Over Your Notes
12 min# rag_lite.py — ask questions about a text file import anthropic from pathlib import Path from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity client = anthropic.Anthropic() def chunk(text, size=500, overlap=50): out, i = [], 0 while i < len(text): out.append(text[i:i + size]); i += size - overlap return out def answer(query, chunks, vec, matrix): q = vec.transform([query]) top = cosine_similarity(q, matrix)[0].argsort()[-3:][::-1] context = "\n\n---\n\n".join(chunks[i] for i in top) msg = client.messages.create( model="claude-haiku-4-5", max_tokens=400, system="Answer ONLY from the context. If absent, say " "'Not found in the notes.' Cite the relevant phrase.", messages=[{"role": "user", "content": f"Context:\n{context}\n\nQ: {query}"}], ) return msg.content[0].text # build the index once text = Path("biology_notes.txt").read_text() chunks = chunk(text) vec = TfidfVectorizer(stop_words="english") matrix = vec.fit_transform(chunks) # ask away for q in ["What is mitosis?", "Who won the 2018 World Cup?"]: print(f"Q: {q}\nA: {answer(q, chunks, vec, matrix)}\n")
Sample output
Q: What is mitosis? A: Mitosis is cell division producing two identical daughter cells, as the notes state: "mitosis divides one cell into two genetically identical cells." Q: Who won the 2018 World Cup? A: Not found in the notes.
Read the diff
The model answered the biology question from the notes (and cited the phrase) but correctly refused the World Cup question — because it's not in the context and the system prompt forbids guessing. That refusal is the whole point: a grounded, honest assistant over your documents.
Try It Yourself
13 minPoint RAG-Lite at one of your own study-notes text files. Ask 3 questions; confirm it answers from the text and refuses off-topic ones.
Print which chunks were retrieved for each question, so you can verify the answer is grounded.
Swap TF-IDF for neural embeddings (sentence-transformers, model "all-MiniLM-L6-v2"). Compare retrieval quality on questions phrased differently from the notes.
Hint
# pip install sentence-transformers from sentence_transformers import SentenceTransformer import numpy as np embed = SentenceTransformer("all-MiniLM-L6-v2") chunk_vecs = embed.encode(chunks) q_vec = embed.encode([query]) sims = (chunk_vecs @ q_vec.T).ravel() # cosine if normalised
Embeddings match by meaning, so "cell splitting" can retrieve a chunk about "mitosis" even with no shared words.
Mini-Challenge · Study-Notes Web App
8 minCombine RAG-Lite with the Lesson 44 Flask app: upload/select a notes file, then chat with it. The backend retrieves + grounds every answer. A personal "chat with my notes" tool.
Recap
3 minRAG = chunk your docs → retrieve the relevant chunks (TF-IDF or embeddings) → generate an answer grounded in them, with a system prompt that forbids guessing. It gives the model knowledge it never had and cuts hallucination. Embeddings retrieve by meaning, not just keywords. That's the last LLM technique — next, the ethics that govern all of this.
Vocabulary Card
- RAG
- Retrieval-Augmented Generation — retrieve relevant text, then generate grounded in it.
- chunk
- A small slice of a document, sized to fit several in a prompt.
- embedding
- A vector capturing a text's meaning; similar meanings sit close together.
- grounding
- Forcing the model to answer from provided sources, not its memory.
Homework
4 minBuild a RAG-Lite tool over a document you care about (your notes, a rulebook, a manual). Show 3 grounded answers + 1 honest "not found". Print retrieved chunks for transparency. Bonus: try real embeddings and compare.
Use rag_lite.py. The honest "not found" case is the most important to demonstrate — it proves your grounding instruction works.