PY-L5-41 · Few-Shot Prompting — Teach by Example

Learning Goals

3 min

Distinguish zero-shot, one-shot, few-shot.
Embed examples in the messages to teach a pattern.
Use few-shot for classification, extraction and consistent formatting.
Know when few-shot beats a long instruction (and vice versa).

Warm-Up · Show, Don't Tell

5 min

zero-shot: "Classify the sentiment."        (no examples)
one-shot:  give 1 example, then the input
few-shot:  give 2-5 examples, then the input

Examples teach format, edge cases, and your exact label set
better than paragraphs of instructions.

Today's big idea

For anything with a specific format or label set, a few worked examples steer the model more reliably than a long description. You're "programming" the model with demonstrations — and it generalises from them instantly.

New Concept · Examples in the Prompt

14 min

Few-shot in a single user message

import anthropic
client = anthropic.Anthropic()

prompt = """Classify each message as SPAM or HAM.

Message: "You won a FREE iPhone, click here!"
Label: SPAM

Message: "Hey, are we still meeting at 3?"
Label: HAM

Message: "Claim your prize now, limited offer!"
Label: SPAM

Message: "Can you send me the notes from class?"
Label:"""

msg = client.messages.create(
    model="claude-haiku-4-5", max_tokens=10, temperature=0,
    messages=[{"role": "user", "content": prompt}],
)
print(msg.content[0].text.strip())   # → HAM

Few-shot via alternating messages (cleaner)

examples = [
    ("You won a FREE iPhone!", "SPAM"),
    ("Are we still meeting at 3?", "HAM"),
    ("Claim your prize now!", "SPAM"),
]
messages = []
for text, label in examples:
    messages.append({"role": "user", "content": f"Classify: {text}"})
    messages.append({"role": "assistant", "content": label})
# the real query last:
messages.append({"role": "user", "content": "Classify: Send me the notes?"})

msg = client.messages.create(model="claude-haiku-4-5", max_tokens=10,
                             temperature=0, messages=messages)

Putting examples as real user/assistant turns is the cleanest way — the model sees a genuine "conversation" demonstrating the pattern.

Great for extraction

# teach it to pull structured data from messy text
prompt = '''Extract name and age as JSON.

Text: "Hi I'm Aisyah and I'm 13"
JSON: {"name": "Aisyah", "age": 13}

Text: "Wei Jie here, fourteen years old"
JSON: {"name": "Wei Jie", "age": 14}

Text: "I am Suresh, 12"
JSON:'''

When to use what

few-shot wins:  specific format, custom labels, tricky edge cases
instructions:   simple open-ended tasks, or when examples are hard to write
both together:  system prompt (rules) + a couple of examples = very reliable

Worked Example · Few-Shot Topic Tagger

12 min

# tagger.py — tag support tickets into a fixed label set, few-shot
import anthropic
client = anthropic.Anthropic()

SHOTS = [
    ("My payment failed twice today.",          "billing"),
    ("The app crashes when I open settings.",   "bug"),
    ("How do I change my password?",            "how-to"),
    ("I want a refund for last month.",         "billing"),
]

def tag(ticket):
    messages = []
    for text, label in SHOTS:
        messages.append({"role": "user", "content": f"Ticket: {text}"})
        messages.append({"role": "assistant", "content": label})
    messages.append({"role": "user", "content": f"Ticket: {ticket}"})
    msg = client.messages.create(
        model="claude-haiku-4-5", max_tokens=5, temperature=0,
        system="Reply with exactly one label: billing, bug, or how-to.",
        messages=messages,
    )
    return msg.content[0].text.strip()

for t in ["The screen freezes on login",
          "Can you explain how to export my data?",
          "I was charged twice"]:
    print(f"{tag(t):<8} <- {t}")

Sample output

bug      <- The screen freezes on login
how-to   <- Can you explain how to export my data?
billing  <- I was charged twice

Read the diff

Four examples + a system prompt pinning the label set = a reliable classifier with zero training data. Compare to Lesson 35 (TF-IDF), which needed many labelled examples and a training step. Few-shot LLM classification is instant — but slower and costlier per call. Pick the right tool for volume vs setup time.

Try It Yourself

13 min

01 🟢 Zero vs few-shot

Try a classification task zero-shot, then add 3 examples. Does few-shot improve consistency / format?

02 🟡 Style transfer

Give 2 examples of "formal → casual" rewrites, then ask it to casual-ify a new formal sentence.

03 🔴 Structured extraction

Few-shot the model to pull {product, price, in_stock} from messy product blurbs. json.loads each result.

Mini-Challenge · Beat Your TF-IDF

8 min

Take a small classification task. Build BOTH a few-shot LLM classifier and a TF-IDF model (Lesson 35) on the same handful of examples. Compare accuracy AND cost/speed. Which would you ship for 1 ticket/day? For 1 million/day?

Recap

3 min

Few-shot prompting teaches a pattern with 2-5 examples — best as alternating user/assistant turns. It nails specific formats, custom labels, and edge cases without training. Combine with a system prompt for max reliability. It's instant but per-call costly: great for low volume or quick prototypes; a trained model wins at scale. Next: let the LLM call your functions.

Vocabulary Card

zero/one/few-shot: Zero, one, or several worked examples included in the prompt.
in-context learning: The model generalising from examples in the prompt, with no weight updates.
demonstration: An example input→output pair that shows the desired behaviour.
extraction: Pulling structured data out of unstructured text.

Homework

4 min

Build a few-shot tool for a real task you care about (tagging, extraction, rewriting). Use 3-5 examples + a system prompt. Test on 5 new inputs. One paragraph: did few-shot or plain instructions work better, and why?

zero-shot: "Classify the sentiment." (no examples) one-shot: give 1 example, then the input few-shot: give 2-5 examples, then the input Examples teach format, edge cases, and your exact label set better than paragraphs of instructions.

import anthropic client = anthropic.Anthropic() prompt = """Classify each message as SPAM or HAM. Message: "You won a FREE iPhone, click here!" Label: SPAM Message: "Hey, are we still meeting at 3?" Label: HAM Message: "Claim your prize now, limited offer!" Label: SPAM Message: "Can you send me the notes from class?" Label:""" msg = client.messages.create( model="claude-haiku-4-5", max_tokens=10, temperature=0, messages=[{"role": "user", "content": prompt}], ) print(msg.content[0].text.strip()) # → HAM

examples = [ ("You won a FREE iPhone!", "SPAM"), ("Are we still meeting at 3?", "HAM"), ("Claim your prize now!", "SPAM"), ] messages = [] for text, label in examples: messages.append({"role": "user", "content": f"Classify: {text}"}) messages.append({"role": "assistant", "content": label}) # the real query last: messages.append({"role": "user", "content": "Classify: Send me the notes?"}) msg = client.messages.create(model="claude-haiku-4-5", max_tokens=10, temperature=0, messages=messages)

# teach it to pull structured data from messy text prompt = '''Extract name and age as JSON. Text: "Hi I'm Aisyah and I'm 13" JSON: {"name": "Aisyah", "age": 13} Text: "Wei Jie here, fourteen years old" JSON: {"name": "Wei Jie", "age": 14} Text: "I am Suresh, 12" JSON:'''

few-shot wins: specific format, custom labels, tricky edge cases instructions: simple open-ended tasks, or when examples are hard to write both together: system prompt (rules) + a couple of examples = very reliable

# tagger.py — tag support tickets into a fixed label set, few-shot import anthropic client = anthropic.Anthropic() SHOTS = [ ("My payment failed twice today.", "billing"), ("The app crashes when I open settings.", "bug"), ("How do I change my password?", "how-to"), ("I want a refund for last month.", "billing"), ] def tag(ticket): messages = [] for text, label in SHOTS: messages.append({"role": "user", "content": f"Ticket: {text}"}) messages.append({"role": "assistant", "content": label}) messages.append({"role": "user", "content": f"Ticket: {ticket}"}) msg = client.messages.create( model="claude-haiku-4-5", max_tokens=5, temperature=0, system="Reply with exactly one label: billing, bug, or how-to.", messages=messages, ) return msg.content[0].text.strip() for t in ["The screen freezes on login", "Can you explain how to export my data?", "I was charged twice"]: print(f"{tag(t):<8} <- {t}")