Context Optimization & Positioning
Optimize context window usage with strategic positioning, trimming, and summarization techniques while avoiding common pitfalls.
Context management techniques:
Progressive summarization risks: important details can be lost through repeated summarization
'Lost in the middle' effect: information in the middle of long contexts is less likely to be recalled
'Case facts' blocks: structured reference sections that preserve critical information
Trimming verbose tool outputs: remove noise while retaining essential data
Position-aware ordering: put the most important information at the beginning and end of context
Anti-Patterns to Avoid
Progressive summarization of critical details without preserving originals
Ignoring the 'lost in the middle' effect in long context windows
Context management is about making the most of the limited context window while preserving critical information. Two key concepts dominate this domain:
1. Progressive Summarization Risks
Progressive summarization compresses conversation history to save space. While it seems efficient, it silently destroys critical details:
Original: "Customer John Smith (ACC-12345) called about order #98765. Charged $150.00 instead of promotional $99.99."
After 1st summary: "Customer called about billing issue with promotion."
After 2nd summary: "Customer has a billing issue."
The customer name, account number, order number, exact amounts, and promotion code — all lost.
2. The "Lost in the Middle" Effect
Research shows that information in the middle of long contexts is less likely to be recalled by the model. Information at the beginning and end gets more attention.
The solution: "Case Facts" blocks
Instead of summarizing, preserve critical information in an immutable structured block placed at the beginning of context (high-recall position). This block is never summarized or compressed and contains all essential reference data.
1## CASE FACTS (Do not summarize — reference directly)23| Field | Value |4|----------------|------------------------------------------|5| Customer | John Smith |6| Account ID | ACC-12345 |7| Order | #98765 |8| Expected Price | $99.99 (promotion SUMMER2026) |9| Charged Price | $150.00 |10| Overcharge | $50.01 |11| Customer Since | 2019 (7-year tenure) |12| Priority | High (long-term customer + overcharge) |1314## RULES15- Always address customer as "Mr. Smith"16- This case qualifies for immediate resolution17- Refund amount ($50.01) is within $500 agent limit# Progressive summarization loses critical details Turn 1: "John Smith (ACC-12345) order #98765..." Turn 5: [Summary] "Customer billing issue" Turn 10: [Summary] "Billing issue being handled" # By turn 10: lost name, account, order, amounts
# Case facts block — always available, never summarized ## CASE FACTS (immutable) - Customer: John Smith (ACC-12345) - Order: #98765 - Issue: Overcharged $50.01 (promo SUMMER2026) # This block stays intact regardless of length
Escalation & Error Propagation
Design escalation patterns and error propagation strategies that provide enough context for recovery or human intervention.
Escalation and error handling:
Escalation triggers: customer demands, policy gaps — not just sentiment
Structured error context vs generic errors: always include what was attempted
Access failures vs empty results: distinguish between 'could not check' and 'checked and found nothing'
Local recovery before coordinator escalation: try to fix locally first
Partial results + what was attempted: always report progress even on failure
Anti-Patterns to Avoid
Sentiment-based escalation (sentiment does not equal task complexity)
Generic error propagation that loses the original error context
Silently suppressing errors instead of escalating with context
Escalation and error propagation patterns determine how failures flow through a system. Getting this wrong means either overwhelming humans with trivial issues or silently dropping critical failures.
Valid escalation triggers:
Invalid escalation triggers (exam anti-patterns):
Error propagation in multi-agent systems:
When a subagent fails, it must report structured context to the coordinator:
Never silently drop subagent failures. If a subagent can't access a database, the coordinator must know the data is missing — not assume the query returned nothing.
1def should_escalate(context):2 """Determine if we need human intervention."""3 4 # VALID escalation triggers5 if context.customer_requested_human:6 return True, "Customer explicitly requested human agent"7 8 if context.policy_gap_detected:9 return True, "No policy covers this situation"10 11 if context.amount > AGENT_REFUND_LIMIT:12 return True, f"Amount {context.amount} exceeds limit"13 14 if context.retry_count >= MAX_RETRIES:15 return True, "Exhausted retry attempts"16 17 # INVALID triggers — DO NOT use these18 # if context.sentiment == "negative": # WRONG!19 # return True # Sentiment != complexity20 21 # if context.model_confidence < 0.7: # WRONG!22 # return True # Self-reported confidence unreliable23 24 return False, None# Escalate based on sentiment (WRONG)
if customer_sentiment == "angry":
escalate_to_human()
# An angry customer asking to change their
# address does NOT need a human agent
# Escalate based on confidence (WRONG)
if model_confidence < 0.7:
escalate_to_human()
# Model self-reported confidence is unreliable# Escalate based on objective criteria (RIGHT)
if customer.requested_human:
escalate("Customer requested human")
if not policy_covers(situation):
escalate("Policy gap detected")
if refund_amount > AGENT_LIMIT:
escalate(f"Amount exceeds {AGENT_LIMIT} limit")Context Degradation & Extended Sessions
Handle context degradation in long-running sessions. Use scratchpad files, /compact, and subagent delegation to maintain quality.
Managing extended sessions:
Context degradation: quality decreases in extended sessions as context fills up
Scratchpad files: external files to persist important state across context resets
/compact: compress conversation history to reclaim context space
Subagent delegation: delegate verbose exploration to subagents to keep coordinator context clean
Crash recovery manifests: persistent state files that enable session recovery
Anti-Patterns to Avoid
Running extended sessions without monitoring context degradation
Not using scratchpad files for important intermediate state
Long-running agent sessions suffer from context degradation — the quality of responses decreases as the conversation grows longer and the context window fills up.
Symptoms of context degradation:
Mitigation strategies (exam favorites):
Stratified metrics (per-document-type tracking):
Aggregate accuracy metrics can mask per-category failures. If invoices have 70% accuracy while receipts have 99%, the aggregate might still show 95%. Track accuracy per document type to reveal hidden failures.
Information provenance:
Always preserve the source and confidence level of information. Track where each piece of data came from and how reliable the source is. This enables the coordinator to make informed decisions about conflicting information.
1# Strategy 1: Scratchpad files for persistent state2agent.run("""3Before starting complex analysis:41. Create a scratchpad file: progress.md52. Record key findings as you discover them63. Update progress.md after each major step74. If context gets long, use /compact85. After /compact, re-read progress.md to restore context9""")1011# Strategy 2: Subagent delegation for verbose tasks12coordinator = Agent(tools=[Task, read_scratchpad, summarize])13coordinator.run("""14For this codebase analysis:151. Delegate file-by-file analysis to a subagent16 (keeps verbose exploration out of coordinator context)172. Subagent writes findings to scratchpad files183. Coordinator reads summarized findings194. Coordinator synthesizes final report20""")2122# Strategy 3: Stratified metrics23def track_accuracy(results):24 """Track per-document-type, not just aggregate."""25 by_type = {}26 for r in results:27 doc_type = r["document_type"]28 if doc_type not in by_type:29 by_type[doc_type] = {"correct": 0, "total": 0}30 by_type[doc_type]["total"] += 131 if r["is_correct"]:32 by_type[doc_type]["correct"] += 133 34 # This reveals hidden failures per category35 for doc_type, stats in by_type.items():36 accuracy = stats["correct"] / stats["total"] * 10037 print(f"{doc_type}: {accuracy:.1f}%")# Aggregate metrics only (masks failures) total_correct = 950 total_processed = 1000 accuracy = 95.0% # "Looks great!" # But actually: # Invoices: 70/100 = 70% (FAILING!) # Receipts: 880/900 = 97.8% # The aggregate HIDES the invoice problem
# Per-document-type metrics (reveals failures) Invoice accuracy: 70.0% # ALERT: Below threshold! Receipt accuracy: 97.8% # OK Contract accuracy: 100.0% # OK # Now we can see and fix the invoice problem
Human Review & Information Provenance
Design human-in-the-loop review systems and maintain information provenance through claim-source mappings and temporal data.
Human review and provenance:
Stratified sampling: review samples across different categories, not just random selection
Field-level confidence: provide confidence indicators for individual data fields
Accuracy by document type: track performance per document category, not just aggregate
Claim-source mappings: link each output claim to its source for traceability
Temporal data: preserve timestamps and version information for currency
Conflict annotation: explicitly mark conflicting sources rather than silently choosing one
Anti-Patterns to Avoid
Aggregate accuracy metrics that mask per-document-type failures
Not maintaining claim-source mappings for traceability
Silently resolving source conflicts instead of annotating them
Information provenance means tracking where each piece of data came from and how reliable the source is. This is critical for multi-agent systems where different subagents contribute information from different sources.
Why provenance matters:
Provenance metadata to track:
Human-in-the-loop checkpoints:
For critical decisions, the system should pause and present the human with:
This is especially important for:
1from dataclasses import dataclass2from datetime import datetime3from typing import Literal45@dataclass6class DataWithProvenance:7 value: str | float | dict8 source: str # "customer-db", "invoice-pdf", "web"9 confidence: Literal["verified", "extracted", "inferred", "estimated"]10 retrieved_at: datetime11 agent_id: str # Which subagent provided this1213def resolve_conflict(data_points: list[DataWithProvenance]):14 """When subagents disagree, use provenance to decide."""15 16 confidence_rank = {17 "verified": 4, # From authoritative source18 "extracted": 3, # Parsed from structured document19 "inferred": 2, # Derived from context20 "estimated": 1, # Best guess21 }22 23 # Pick the most reliable source24 best = max(data_points, key=lambda d: confidence_rank[d.confidence])25 26 # Log the conflict for audit trail27 log_conflict(28 chosen=best,29 alternatives=data_points,30 reason=f"Selected {best.source} (confidence: {best.confidence})"31 )32 33 return best# No provenance tracking revenue = subagent_1.get_revenue() # From where? revenue_2 = subagent_2.get_revenue() # Conflicts! # Which one do we trust? We don't know! final_revenue = revenue # Arbitrary choice
# With provenance tracking
rev_1 = DataWithProvenance(
value=1_500_000, source="financial-db",
confidence="verified", agent_id="finance-agent")
rev_2 = DataWithProvenance(
value=1_480_000, source="quarterly-pdf",
confidence="extracted", agent_id="doc-agent")
# Trust the verified database over extracted PDF
final = resolve_conflict([rev_1, rev_2])Exam Tips for Domain 5
Progressive summarization loses critical details — use 'case facts' blocks instead
Sentiment ≠ complexity for escalation decisions
Always distinguish access failures from genuinely empty results
Track accuracy per document type, not just aggregate
Related Exam Scenarios
Customer Support Resolution Agent
Design an AI-powered customer support agent that handles inquiries, resolves issues, and escalates complex cases. Tests Agent SDK usage, MCP tools, and escalation logic.
Multi-Agent Research System
Build a coordinator-subagent system for parallel research tasks. Tests multi-agent orchestration, context passing, error propagation, and result synthesis.
Structured Data Extraction
Build a structured data extraction pipeline from unstructured documents. Tests JSON schemas, tool_use, validation-retry loops, and few-shot prompting.
Test Your Knowledge of Context & Reliability
Practice with scenario-based questions covering this domain.