Learning Goals
3 min- Distinguish zero-shot, one-shot, few-shot.
- Embed examples in the messages to teach a pattern.
- Use few-shot for classification, extraction and consistent formatting.
- Know when few-shot beats a long instruction (and vice versa).
Warm-Up · Show, Don't Tell
5 minzero-shot: "Classify the sentiment." (no examples) one-shot: give 1 example, then the input few-shot: give 2-5 examples, then the input Examples teach format, edge cases, and your exact label set better than paragraphs of instructions.
For anything with a specific format or label set, a few worked examples steer the model more reliably than a long description. You're "programming" the model with demonstrations — and it generalises from them instantly.
New Concept · Examples in the Prompt
14 minFew-shot in a single user message
import anthropic client = anthropic.Anthropic() prompt = """Classify each message as SPAM or HAM. Message: "You won a FREE iPhone, click here!" Label: SPAM Message: "Hey, are we still meeting at 3?" Label: HAM Message: "Claim your prize now, limited offer!" Label: SPAM Message: "Can you send me the notes from class?" Label:""" msg = client.messages.create( model="claude-haiku-4-5", max_tokens=10, temperature=0, messages=[{"role": "user", "content": prompt}], ) print(msg.content[0].text.strip()) # → HAM
Few-shot via alternating messages (cleaner)
examples = [ ("You won a FREE iPhone!", "SPAM"), ("Are we still meeting at 3?", "HAM"), ("Claim your prize now!", "SPAM"), ] messages = [] for text, label in examples: messages.append({"role": "user", "content": f"Classify: {text}"}) messages.append({"role": "assistant", "content": label}) # the real query last: messages.append({"role": "user", "content": "Classify: Send me the notes?"}) msg = client.messages.create(model="claude-haiku-4-5", max_tokens=10, temperature=0, messages=messages)
Putting examples as real user/assistant turns is the cleanest way — the model sees a genuine "conversation" demonstrating the pattern.
Great for extraction
# teach it to pull structured data from messy text prompt = '''Extract name and age as JSON. Text: "Hi I'm Aisyah and I'm 13" JSON: {"name": "Aisyah", "age": 13} Text: "Wei Jie here, fourteen years old" JSON: {"name": "Wei Jie", "age": 14} Text: "I am Suresh, 12" JSON:'''
When to use what
few-shot wins: specific format, custom labels, tricky edge cases instructions: simple open-ended tasks, or when examples are hard to write both together: system prompt (rules) + a couple of examples = very reliable
Worked Example · Few-Shot Topic Tagger
12 min# tagger.py — tag support tickets into a fixed label set, few-shot import anthropic client = anthropic.Anthropic() SHOTS = [ ("My payment failed twice today.", "billing"), ("The app crashes when I open settings.", "bug"), ("How do I change my password?", "how-to"), ("I want a refund for last month.", "billing"), ] def tag(ticket): messages = [] for text, label in SHOTS: messages.append({"role": "user", "content": f"Ticket: {text}"}) messages.append({"role": "assistant", "content": label}) messages.append({"role": "user", "content": f"Ticket: {ticket}"}) msg = client.messages.create( model="claude-haiku-4-5", max_tokens=5, temperature=0, system="Reply with exactly one label: billing, bug, or how-to.", messages=messages, ) return msg.content[0].text.strip() for t in ["The screen freezes on login", "Can you explain how to export my data?", "I was charged twice"]: print(f"{tag(t):<8} <- {t}")
Sample output
bug <- The screen freezes on login how-to <- Can you explain how to export my data? billing <- I was charged twice
Read the diff
Four examples + a system prompt pinning the label set = a reliable classifier with zero training data. Compare to Lesson 35 (TF-IDF), which needed many labelled examples and a training step. Few-shot LLM classification is instant — but slower and costlier per call. Pick the right tool for volume vs setup time.
Try It Yourself
13 minTry a classification task zero-shot, then add 3 examples. Does few-shot improve consistency / format?
Give 2 examples of "formal → casual" rewrites, then ask it to casual-ify a new formal sentence.
Few-shot the model to pull {product, price, in_stock} from messy product blurbs. json.loads each result.
Mini-Challenge · Beat Your TF-IDF
8 minTake a small classification task. Build BOTH a few-shot LLM classifier and a TF-IDF model (Lesson 35) on the same handful of examples. Compare accuracy AND cost/speed. Which would you ship for 1 ticket/day? For 1 million/day?
Recap
3 minFew-shot prompting teaches a pattern with 2-5 examples — best as alternating user/assistant turns. It nails specific formats, custom labels, and edge cases without training. Combine with a system prompt for max reliability. It's instant but per-call costly: great for low volume or quick prototypes; a trained model wins at scale. Next: let the LLM call your functions.
Vocabulary Card
- zero/one/few-shot
- Zero, one, or several worked examples included in the prompt.
- in-context learning
- The model generalising from examples in the prompt, with no weight updates.
- demonstration
- An example input→output pair that shows the desired behaviour.
- extraction
- Pulling structured data out of unstructured text.
Homework
4 minBuild a few-shot tool for a real task you care about (tagging, extraction, rewriting). Use 3-5 examples + a system prompt. Test on 5 new inputs. One paragraph: did few-shot or plain instructions work better, and why?
Adapt tagger.py to your task. Few-shot usually wins when the output format is specific; plain instructions can suffice for open-ended generation.