Intermediate

Using Examples (Few-Shot Prompting)

A single well-chosen example often teaches Claude more than three paragraphs of description. Master the H/A turn format, example quality criteria, and dynamic example selection.

15 min read 12 examples Chapter 7 of 11

Zero-Shot, One-Shot, Few-Shot: When to Use Each

Few-shot prompting means giving Claude examples of the task before asking it to complete a new instance. The examples act as demonstrations: they show Claude not just what to do, but how to do it at a concrete, pattern-matchable level.

0️⃣
Zero-Shot
No examples — just instructions. Best for: well-defined tasks Claude handles reliably (summarization, translation, basic classification). Lowest token cost. Start here and only add examples if output is inconsistent.
1️⃣
One-Shot
One example. Best for: establishing a specific format or tone that's hard to describe. One good example often does more than a paragraph of format instructions. High ROI for low token cost.
🔢
Few-Shot (2–8)
Multiple examples. Best for: nuanced classification, idiosyncratic writing styles, domain-specific formats, edge case coverage. Diminishing returns after 5-8 examples for most tasks.
💡
Examples Outperform Descriptions
For format and style tasks, a concrete example is almost always more effective than a verbal description. You can spend three sentences describing the tone you want, or show one sentence that has it. Show, don't tell.

The H/A Turn Format for Embedding Examples

The cleanest way to embed examples in a prompt is using the Human/Assistant (H/A) conversation turn format. This mirrors how Claude actually works — each input/output pair becomes a simulated prior conversation turn:

H/A Turn Format (API)
import anthropic

client = anthropic.Anthropic()

# Few-shot via message history — the cleanest API pattern
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=256,
    system="You are a customer service agent for a software company. Classify support tickets into one of three categories: Bug / Feature Request / Account Issue. Respond with just the category name.",
    messages=[
        # Example 1
        {"role": "user", "content": "I can't log in — it says my password is wrong but I just reset it."},
        {"role": "assistant", "content": "Account Issue"},
        # Example 2
        {"role": "user", "content": "The export to CSV function crashes when I have more than 1000 rows."},
        {"role": "assistant", "content": "Bug"},
        # Example 3
        {"role": "user", "content": "It would be really helpful if I could schedule reports to send automatically."},
        {"role": "assistant", "content": "Feature Request"},
        # Actual input
        {"role": "user", "content": "The dark mode toggle doesn't save between sessions — every time I reload I have to switch it back."}
    ]
)

print(response.content[0].text)  # → "Bug"

Inline Examples with XML Tags

For simpler cases or when working in a prompt template, you can embed examples inline using XML tags:

Inline Examples with XML
Rewrite the following customer review response in our brand voice.
Brand voice: warm, direct, uses the customer's name, specific not generic,
ends with a forward-looking statement.

Here are two examples of our ideal responses:

<example>
<original>Thank you for your feedback. We're sorry for the inconvenience.</original>
<rewritten>Hi Sarah — we hear you, and that's genuinely frustrating. Our team is already looking at the sync issue you hit. You'll get an email the moment it's fixed. Thank you for taking the time to let us know.</rewritten>
</example>

<example>
<original>We appreciate your business and will look into this.</original>
<rewritten>James, thanks for flagging this — a 3-day delay is not okay and you deserve better. I've escalated this to our logistics team directly. Expect an update by tomorrow afternoon.</rewritten>
</example>

Now rewrite this response in the same voice:
<original>{original_response}</original>

Example Quality: What Makes a Good Example

1
Representative of the typical case
Your examples should reflect what Claude will see most often in real usage. An example of an unusual edge case as the only example will bias Claude toward handling edge cases incorrectly.
2
Diverse, not identical
If all your examples look the same, Claude learns from a narrow sample. Include examples with different inputs, different lengths, different cases — especially for classification tasks with multiple classes.
3
Include edge cases deliberately
If you know there are tricky edge cases (ambiguous inputs, borderline classifications), include at least one example of each. Examples are the most efficient way to calibrate Claude on hard cases.
4
Correct and verified
Wrong examples teach wrong patterns. Every example in your prompt should be hand-verified as the gold-standard correct response. One bad example can corrupt a whole set.
5
Consistent with instructions
Examples that contradict your instructions create conflicting signals. If you say "always use formal language" but your examples use contractions, the examples often win — Claude learns by pattern, not by rule priority.

How Many Examples: Diminishing Returns

More examples aren't always better. The relationship between example count and accuracy improvement looks like a diminishing returns curve:

Example Count Guidelines by Task Type
TASK TYPE                    RECOMMENDED EXAMPLES  NOTES
─────────────────────────────────────────────────────────────────
Binary classification        2 (one per class)     One yes, one no
Multi-class (3-5 classes)    3-5 (one per class)   Cover all classes
Complex format matching      1-3                    One great example often sufficient
Style/tone matching          1-2                    Show don't tell
Extraction tasks             2-3                    Cover edge cases
Translation/transformation   2-3                    Include a tricky case
Creative writing style       3-5                    Show range of the style
The Example Audit
After writing your examples, re-read them as if you've never seen your instructions. Does the pattern they demonstrate match exactly what you want? Remove any example that teaches the wrong thing — even if it's a good example of something else.

Before / After: Description vs. Example for Tone Matching

Write a push notification for our fitness app. The tone should be motivational but not cheesy, direct, uses second person, short (under 15 words), creates urgency, and feels personal not corporate. Today's workout: Upper body strength, 30 minutes.
Write a push notification for our fitness app. Here's an example of our notification style: "You haven't moved today. 20 minutes is all it takes. Go." Now write one for: Today's workout: Upper body strength, 30 minutes.
Description only: "Time to crush your upper body! Your 30-minute strength session is ready and waiting. Let's get stronger together! 💪" (cheesy, corporate-feeling, too long, uses exclamation points) With example: "Arms haven't ached yet today. Let's fix that. 30 minutes." (matches the example's terse, direct voice exactly)

Negative Examples: Teaching What NOT to Do

Negative examples explicitly show Claude what a bad output looks like and why. They're especially useful when Claude keeps drifting toward a particular failure mode:

Negative Example Pattern
Classify the sentiment of customer reviews as positive, negative, or neutral.

GOOD EXAMPLES:
Input: "Works exactly as described. Arrived on time."
Output: positive

Input: "Broke after one week. Terrible quality."
Output: negative

BAD EXAMPLES (do not do this):
Input: "It's fine I guess."
Output: positive  ← WRONG: "fine I guess" signals lukewarm disappointment, not satisfaction. This should be: neutral

Input: "Expensive but worth it."
Output: negative  ← WRONG: explicit statement of worth overrides the price complaint. This should be: positive

Classify this review: "Not what I expected, but I've gotten used to it."

Dynamic Few-Shot: Selecting Relevant Examples

For large-scale production systems, you may have a library of dozens of examples. Rather than using all of them (expensive) or fixed ones (potentially irrelevant), you can dynamically select the most relevant examples for each input using embedding similarity:

Dynamic Few-Shot Selection (Python)
from anthropic import Anthropic
import numpy as np

# Simplified example — in production use a vector DB (Pinecone, Weaviate, etc.)
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def select_relevant_examples(query: str, example_library: list, k: int = 3) -> list:
    """Select k most similar examples to query using embedding similarity."""
    client = Anthropic()
    # In practice: embed query and examples, find top-k by cosine similarity
    # This pseudocode illustrates the concept
    query_embedding = embed(query)
    scored = [(cosine_similarity(query_embedding, ex["embedding"]), ex)
              for ex in example_library]
    scored.sort(reverse=True)
    return [ex for _, ex in scored[:k]]

def classify_with_dynamic_examples(input_text: str) -> str:
    examples = select_relevant_examples(input_text, EXAMPLE_LIBRARY, k=3)
    messages = []
    for ex in examples:
        messages.append({"role": "user", "content": ex["input"]})
        messages.append({"role": "assistant", "content": ex["output"]})
    messages.append({"role": "user", "content": input_text})
    # ... make API call with dynamic messages
Chapter 7 Takeaway
Examples teach patterns more efficiently than instructions for format and style tasks. Use the H/A turn format for clean API-level few-shot prompting. Keep examples diverse, verified, and consistent with your instructions. For complex tasks, 2-5 well-chosen examples beat 10 mediocre ones. When Claude keeps producing the wrong output format, add one more example before adding more instructions.