Explicit Criteria & Instruction Design
Write prompts with explicit, measurable criteria instead of vague instructions. Understand how false positives impact developer trust.
Prompt design principles:
Explicit criteria over vague instructions: 'flag functions over 50 lines' vs 'flag long functions'
False positive impact: too many false positives erode developer trust in the system
Specificity reduces ambiguity and improves consistency across runs
Measurable criteria enable automated validation of output quality
Anti-Patterns to Avoid
Vague instructions like 'make it better' or 'improve the code'
Not considering the downstream impact of false positives
Production prompts require explicit, measurable criteria instead of vague instructions. This is a fundamental principle tested across multiple exam scenarios.
Why vagueness fails in production:
The false positive problem:
When a code review tool flags too many non-issues, developers start ignoring ALL flags — including real problems. This is called alert fatigue and is directly tested on the exam.
The fix — measurable criteria:
Instead of "flag long functions," specify "flag functions exceeding 50 lines of code." Instead of "find security issues," specify "identify hardcoded strings matching patterns for API keys, passwords, or connection strings."
Measurable criteria:
1# VAGUE: Results are inconsistent and over-flag2vague_prompt = """3Review this code for quality issues.4Be thorough and flag anything suspicious.5"""67# EXPLICIT: Results are consistent and actionable8explicit_prompt = """9Review this code and flag ONLY the following:101. Functions exceeding 50 lines of code112. Async operations missing try-catch error handling123. Hardcoded strings matching API key patterns (sk-, pk-, key-)134. Public functions missing JSDoc documentation145. SQL queries constructed with string concatenation1516For each issue found, provide:17- File path and line number18- Which rule (1-5) was violated19- Severity: critical (3,5) | warning (1,2) | info (4)20- One-line fix suggestion21"""Few-Shot Prompting
Use few-shot examples to guide Claude's output format and reasoning. Know when and how many examples to provide.
Few-shot prompting techniques:
2-4 examples: optimal for ambiguous cases to establish format and reasoning patterns
Format consistency: all examples should follow the same output structure
Edge case coverage: include at least one example that handles an edge case
Few-shot is most valuable when the task has ambiguous boundaries
Anti-Patterns to Avoid
Too many examples (>6) that bloat the prompt without adding value
Inconsistent formatting across examples confusing the model
Few-shot prompting provides 2-4 examples that establish the expected output format, reasoning pattern, and edge case handling. It's most valuable when the task has ambiguous boundaries.
The golden rules of few-shot prompting:
When few-shot is most valuable:
When few-shot is unnecessary:
1few_shot_prompt = """2Classify customer reviews. Provide sentiment and reasoning.34Example 1 (Clear positive):5Input: "Absolutely love this product! Best purchase this year."6Output: {"sentiment": "positive", "confidence": "high",7 "reasoning": "Strong positive language, superlative"}89Example 2 (Clear negative):10Input: "Terrible experience. Product broke after 2 days."11Output: {"sentiment": "negative", "confidence": "high",12 "reasoning": "Explicit negative + product failure"}1314Example 3 (Ambiguous — mixed sentiment):15Input: "Great features but the battery life is disappointing."16Output: {"sentiment": "mixed", "confidence": "medium",17 "reasoning": "Positive on features, negative on battery"}1819Example 4 (Edge case — sarcasm):20Input: "Oh wonderful, another update that breaks everything."21Output: {"sentiment": "negative", "confidence": "medium",22 "reasoning": "Sarcastic positive masking frustration"}2324Now classify this review:25Input: "{user_review}"26"""Tool Use for Structured Output
Use tool_use to guarantee JSON schema compliance. Understand the difference between schema compliance and semantic correctness.
Structured output via tool_use:
tool_use guarantees JSON schema compliance — the output will match the defined structure
Semantic errors are still possible: the structure is correct but the content may be wrong
tool_choice options: 'auto', 'any', or forced specific tool for guaranteed invocation
Schema design: required vs optional fields, enums with 'other' + detail, nullable fields
Anti-Patterns to Avoid
Assuming tool_use eliminates all errors (it only guarantees structural compliance)
Not using enums with 'other' category for fields that may have unexpected values
tool_use is the most reliable way to get structured output from Claude. By defining a tool with a JSON schema, you guarantee the output matches the schema structure.
Critical distinction:
This means you still need validation after extraction. The schema ensures you get a valid JSON object, but the content inside might contain errors.
tool_choice parameter:
Schema design best practices:
1import anthropic2client = anthropic.Anthropic()34extract_tool = {5 "name": "extract_invoice",6 "description": "Extract structured data from an invoice",7 "input_schema": {8 "type": "object",9 "properties": {10 "vendor_name": {"type": "string"},11 "invoice_number": {"type": "string"},12 "date": {"type": "string", "description": "ISO 8601"},13 "total": {"type": "number"},14 "document_type": {15 "type": "string",16 "enum": ["standard_invoice", "credit_note",17 "proforma", "other"]18 },19 "document_type_detail": {20 "type": "string",21 "description": "Required if document_type is other"22 }23 },24 "required": ["vendor_name", "invoice_number",25 "date", "total", "document_type"]26 }27}2829# Force this specific tool = guarantees schema compliance30response = client.messages.create(31 model="claude-sonnet-4-20250514",32 tools=[extract_tool],33 tool_choice={"type": "tool", "name": "extract_invoice"},34 messages=[{"role": "user", "content": f"Extract: {invoice}"}]35)# Assuming tool_use catches all errors data = extract_via_tool_use(invoice) # "It's from tool_use, so it must be correct!" save_to_database(data) # No validation! # Structure is valid, but vendor_name might be wrong
# Validate SEMANTICS after tool_use
data = extract_via_tool_use(invoice)
# Structure guaranteed, but verify content:
errors = []
if not re.match(r"\d{4}-\d{2}-\d{2}", data["date"]):
errors.append("Invalid date format")
if data["total"] <= 0:
errors.append("Total must be positive")
if errors:
retry_with_errors(invoice, errors)Validation-Retry Loops & Multi-Pass Review
Implement validation-retry patterns and multi-pass review strategies for reliable output. Understand when retries are effective and when they are not.
Validation and review patterns:
Validation-retry loops: append specific errors to the prompt and retry for self-correction
detected_pattern fields: track dismissal patterns to identify systematic issues
Multi-pass review: per-file local analysis + cross-file integration pass
Self-review limitations: same session retains reasoning context, reducing effectiveness
Batch processing: synchronous for blocking tasks, batch for latency-tolerant workloads
Anti-Patterns to Avoid
Same-session self-review (the model retains its reasoning context, creating bias)
Generic retry without appending specific error information
Aggregate accuracy metrics masking per-document-type failures
Validation-retry loops and multi-pass review are production patterns for improving output quality.
Validation-retry loop:
Key principle: Specific error feedback, not generic.
Multi-pass review strategy:
Same-session self-review limitation:
When the same session generates and reviews code, it retains the original reasoning context creating a blind spot. Fix: use separate sessions for generation and review.
Batch processing strategy:
1def extract_with_validation(document, max_retries=3):2 messages = [{"role": "user", "content": f"Extract: {document}"}]3 4 for attempt in range(max_retries):5 response = client.messages.create(6 model="claude-sonnet-4-20250514",7 tools=[extract_tool],8 tool_choice={"type": "tool", "name": "extract_invoice"},9 messages=messages,10 )11 12 data = parse_tool_response(response)13 errors = validate(data)14 15 if not errors:16 return data # Valid — return results17 18 # CRITICAL: Append SPECIFIC errors for retry19 messages.append({"role": "assistant", "content": response.content})20 messages.append({21 "role": "user",22 "content": f"Validation failed. Fix these errors:\n"23 + "\n".join(f"- {e}" for e in errors)24 + "\nRe-extract with corrections."25 })26 27 raise ExtractionError(f"Failed after {max_retries} attempts")2829def validate(data):30 errors = []31 if data["total"] <= 0:32 errors.append(f"Total must be positive, got {data['total']}")33 if sum(i["total"] for i in data["line_items"]) != data["subtotal"]:34 errors.append("Line items sum doesn't match subtotal")35 return errorsExam Tips for Domain 4
Explicit, measurable criteria > vague instructions (always)
2-4 few-shot examples is the sweet spot for ambiguous tasks
tool_use = structural compliance, NOT semantic correctness
Same-session self-review is an anti-pattern — use separate sessions
Related Exam Scenarios
Code Generation with Claude Code
Configure Claude Code for a development team workflow. Tests CLAUDE.md configuration, plan mode, slash commands, and iterative refinement strategies.
Claude Code for CI/CD
Integrate Claude Code into continuous integration and delivery pipelines. Tests -p flag usage, structured output, batch API, and multi-pass code review.
Structured Data Extraction
Build a structured data extraction pipeline from unstructured documents. Tests JSON schemas, tool_use, validation-retry loops, and few-shot prompting.
Test Your Knowledge of Prompt Engineering
Practice with scenario-based questions covering this domain.