6 Exam Scenarios — Deep Dive
The exam randomly selects 4 of these 6 scenarios. Each walkthrough covers the key architectural decisions, correct approaches, common anti-patterns, and which domains are tested.
Customer Support Resolution Agent
Design an AI-powered customer support agent that handles inquiries, resolves issues, and escalates complex cases. Tests Agent SDK usage, MCP tools, and escalation logic.
Key Architectural Decisions
How should the agentic loop terminate?
Check stop_reason: continue on 'tool_use', exit on 'end_turn'
Parsing assistant text for 'done' or 'complete' keywords
How to enforce a $500 refund limit?
PostToolUse hook that programmatically blocks refund tool calls above $500 and escalates
Adding 'never process refunds above $500' to the system prompt
When should the agent escalate to a human?
Escalate on: explicit customer request, policy gaps, capability limits, business thresholds
Escalating based on negative sentiment or self-reported low confidence
How to preserve customer details in long conversations?
Immutable 'case facts' block at the start of context with name, account ID, order, amounts
Progressive summarization that silently loses critical specifics over multiple rounds
Domains Tested
This scenario tests the intersection of agentic architecture and reliability. Focus on hook-based enforcement (not prompts) and case facts (not summarization). Every escalation question will try to trick you with sentiment-based triggers.
Code Generation with Claude Code
Configure Claude Code for a development team workflow. Tests CLAUDE.md configuration, plan mode, slash commands, and iterative refinement strategies.
Key Architectural Decisions
Where should team coding standards go?
.claude/CLAUDE.md (project-level, version-controlled, shared with team)
~/.claude/CLAUDE.md (user-level, personal only) or inline code comments
When to use plan mode vs direct execution?
Plan mode for multi-file architectural changes; direct execution for simple, well-defined fixes
Always using plan mode (wasteful for simple tasks) or never using it (risky for complex changes)
How to handle complex refactoring that needs isolation?
Use a skill with context: fork and allowed-tools restrictions
Using a simple command that runs in the main session context, polluting it with exploration noise
Best iterative refinement strategy?
TDD iteration: write failing test, implement, verify, refine while keeping tests green
Vague instructions like 'make it better' without concrete verification criteria
Domains Tested
This scenario is purely about Claude Code configuration. Know the three configuration layers, when to use commands vs skills, and the TDD iteration pattern. The exam loves to test whether you put personal prefs in project config.
Multi-Agent Research System
Build a coordinator-subagent system for parallel research tasks. Tests multi-agent orchestration, context passing, error propagation, and result synthesis.
Key Architectural Decisions
What architecture for parallel research tasks?
Hub-and-spoke: coordinator delegates to specialized subagents with isolated contexts
Flat architecture where all agents share a global state or full conversation history
How to pass context from coordinator to subagents?
Pass ONLY the context relevant to each subagent's specific task
Sharing the full coordinator conversation history with every subagent
How to handle conflicting data from different subagents?
Track information provenance (source, confidence, timestamp) and resolve based on reliability
Arbitrarily choosing one result or averaging conflicting values without provenance
How to handle subagent failures?
Structured error propagation: report what was attempted, error type, distinguish access failure from empty result
Silently returning empty results for failed lookups or generic 'operation failed' errors
Domains Tested
This is the hardest scenario. It tests multi-agent patterns deeply. The key traps are: sharing full context with subagents (always wrong), silently dropping subagent failures (always wrong), and ignoring provenance when resolving conflicts.
Developer Productivity with Claude
Build developer tools using the Claude Agent SDK with built-in tools and MCP servers. Tests tool selection, codebase exploration, and code generation workflows.
Key Architectural Decisions
Agent has 18 tools and selects the wrong one. What to do?
Reduce to 4-5 tools per agent, distribute the rest across specialized subagents
Making tool descriptions longer, fine-tuning the model, or switching to a larger model
Which built-in tool for reading a config file?
Read tool (purpose-built for file reading)
Bash('cat config.json') — never use Bash when a dedicated tool exists
How to configure project-level MCP servers?
.mcp.json with ${ENV_VAR} for secrets, version-controlled for the team
~/.claude.json (personal only) or hardcoding API keys in config files
Write vs Edit for modifying an existing file?
Edit for targeted changes to existing files (preserves unchanged content)
Write replaces the ENTIRE file — using it on existing files loses content you did not include
Domains Tested
This scenario is tool-focused. Memorize the 6 built-in tools and when to use each. The '18 tools' question is almost guaranteed — always distribute across subagents. Never use Bash when a built-in tool exists.
Claude Code for CI/CD
Integrate Claude Code into continuous integration and delivery pipelines. Tests -p flag usage, structured output, batch API, and multi-pass code review.
Key Architectural Decisions
How to run Claude Code in a CI pipeline?
Use -p flag for non-interactive mode with --output-format json for structured results
Running in interactive mode or piping commands via stdin
How to review code that Claude generated?
Use a SEPARATE session for review (fresh context, no confirmation bias)
Same-session self-review where the reviewer retains the generator's reasoning
Nightly code audit: synchronous or batch?
Message Batches API for non-urgent tasks (50% cost savings, processes within 24h)
Synchronous requests for non-urgent tasks (2x the cost with no benefit)
How to enforce structured output from review?
--json-schema flag to enforce specific output shape for automated processing
Parsing unstructured text output from the review with regex
Domains Tested
Three facts to memorize: (1) -p for non-interactive, (2) NEVER self-review in the same session, (3) Batch API for non-urgent = 50% savings. These three cover most questions in this scenario.
Structured Data Extraction
Build a structured data extraction pipeline from unstructured documents. Tests JSON schemas, tool_use, validation-retry loops, and few-shot prompting.
Key Architectural Decisions
How to guarantee structured JSON output from extraction?
tool_use with JSON schema + tool_choice forcing a specific tool
Prompting 'output as JSON' (not guaranteed) or post-processing with regex (fragile)
Does tool_use guarantee correctness?
No — tool_use guarantees STRUCTURE only. Validate SEMANTICS separately with business rules.
Assuming tool_use output is always correct because it matched the schema
What to do when extraction validation fails?
Append SPECIFIC error details (which field, what's wrong) and retry
Generic retry: 'there were errors, try again' (no signal for what to fix)
How to handle ambiguous document types?
Include 'other' enum value + document_type_detail field for edge cases; use 2-4 few-shot examples covering edge cases
Rigid enum without 'other' category (forces misclassification of unexpected types)
Domains Tested
The critical concept here is that tool_use guarantees structure, NOT semantics. Every question about extraction reliability will test this. Also know that validation retries need SPECIFIC errors, not generic messages.
Ready to Test Your Knowledge?
Practice with scenario-based questions that mirror the real exam format.