CCA-F Cheat Sheet — All 5 Exam Domains

D1: AGENTIC

27%

D2: TOOLS

18%

D3: CODE

20%

D4: PROMPT

20%

D5: RELIAB.

15%

Agentic Loop

1.1

· Continue when stop_reason = "tool_use"

· Terminate when stop_reason = "end_turn"

· Append tool results to conversation history each iteration

DON'T

· Parse natural language signals to detect completion

· Use arbitrary iteration caps as primary stop mechanism

· Check for assistant text content as completion indicator

⚠

MULTI-AGENT RISK

Subagents have isolated context — they do not inherit coordinator history. Pass complete findings explicitly in each subagent's prompt. Spawn parallel subagents by emitting multiple Task calls in a single coordinator response.

Multi-Agent Orchestration

1.2 1.3

HUB-AND-SPOKE PATTERN

Coordinator manages ALL inter-subagent communication. Subagents have isolated context — no coordinator history inheritance.

DYNAMIC DECOMPOSITION

Generate subtasks based on discoveries. Map structure first, then prioritize adaptively. Best for open-ended investigation.

Task Decomposition

1.6

PROMPT CHAINING

Fixed sequential steps. Best for predictable multi-aspect reviews. Example: analyze each file, then run a cross-file integration pass.

FORKING & RESUMPTION

--resume <name> continues a named session. fork_session branches from a shared baseline. If tool results are stale, start fresh with injected summary.

Enforcement vs Prompt-Based

1.4 1.5

Programmatic (hooks)

Deterministic. Use when compliance is non-negotiable: identity verification, financial thresholds, policy gates. Zero failure rate.

Prompt-based

Probabilistic. Non-zero failure rate. OK for guidance and suggestions. Never rely on this for critical business logic.

PRO-TIP: SDK HOOKS (PostToolUse)

Intercept tool results to normalize formats before model sees them. Intercept outgoing tool calls to enforce rules (block refunds > $500, redirect to escalation).

Tool Descriptions

2.1

FIRST LEVER FOR TOOL SELECTION PROBLEMS

Tool descriptions are the fix before few-shot, before routing layers. Renaming + rewriting solves most misrouting.

GOOD Tool Description:

Specific purpose (not generic)
Expected input formats and example queries
Edge cases and what it does NOT handle
When to use it vs similar alternatives

BAD Tool Design:

Overlapping descriptions cause misrouting
System prompt wording creates unintended associations
One vague tool instead of 2–3 purpose-specific ones

Structured Error Responses

2.2

Error response must include:

errorCategory : transient / validation / permission / business
isRetryable : boolean
Human-readable description
For business violations: retriable: false + customer-friendly message

Access failure (timeout)

Needs retry logic. Propagate with structured context so coordinator can decide.

Valid empty result

Successful query, no matches. Do NOT treat as an error. Distinguish clearly.

Tool Distribution

2.3

⚠

TOO MANY TOOLS = WORSE PERFORMANCE

18 tools vs 4–5 tools — decision complexity degrades reliability. Give each agent only the tools its role needs.

tool_choice options:

"auto" — model may return text or call a tool
"any" — model must call some tool (prevents conversational text)
{"type":"tool","name":"..."} — forces a specific named tool first

MCP Server Configuration

2.4

PROJECT-LEVEL (.mcp.json)

Shared via version control. Use for shared team tooling. Use ${ENV_VAR} expansion for auth tokens — never commit secrets.

USER-LEVEL (~/.claude.json)

Personal/experimental only. Not shared with teammates. Both levels can be active simultaneously.

Built-in Tools (2.5):

Grep : search file contents (function names, imports, error strings)
Glob : find files by name/extension pattern ( **/*.test.tsx )
Edit : targeted modification using unique text match. If fails → Read then Write

CLAUDE.md Hierarchy

3.1

USER-LEVEL

~/.claude/CLAUDE.md

Personal instructions only. Not version-controlled or shared with teammates.

PROJECT-LEVEL

root/CLAUDE.md

Shared via git. Team standards applied to every session.

SCOPED RULES

.claude/rules/

Topic-specific with glob pattern triggers.

⚠

COMMON MISTAKE

If a new team member doesn't receive instructions, check whether they live in ~/.claude/CLAUDE.md instead of project-level. User-level is invisible to teammates.

Custom Commands & Skills

3.2

COMMANDS (.claude/commands/)

Project-scoped, version-controlled, team-wide. Personal commands live in ~/.claude/commands/ and are not shared.

SKILLS (.claude/skills/)

context: fork — isolated sub-agent context. allowed-tools restricts tools. argument-hint prompts for required params.

Skills

On-demand invocation for task-specific workflows (codebase analysis, brainstorming).

CLAUDE.md

Always-loaded universal standards. Rules that must apply to every single session.

Plan Mode vs Direct Execution

3.4

USE PLAN MODE WHEN

Large-scale changes across many files. Multiple valid approaches exist. Architectural decisions required. Monolith-to-microservices migrations (45+ files).

DIRECT EXECUTION WHEN

Simple, well-scoped single-file changes. Clear stack trace pointing to one fix. Adding a single validation or conditional. Scope and approach already clear.

CI/CD Integration

3.6

Key CLI flags:

-p / --print — non-interactive mode (prevents pipeline hangs)
--output-format json + --json-schema — machine-parseable output for inline PR comments

SESSION ISOLATION FOR CODE REVIEW

The same session that generated code is less effective at reviewing it. Use an independent review instance without the generator's reasoning context.

Prompt Precision

4.1

Specific criteria (works)

"Flag comments only when claimed behavior contradicts actual code behavior"

Vague (doesn't work)

"Be conservative" / "Only report high-confidence findings" — no meaningful improvement in precision.

⚠

FALSE POSITIVE EFFECT

High false-positive categories undermine trust in accurate ones. Temporarily disable noisy categories while improving prompts separately. Define explicit severity criteria with concrete code examples.

Few-Shot Prompting

4.2

When few-shot is most effective:

Detailed instructions alone produce inconsistent output
Ambiguous scenarios where judgment must be demonstrated
Reducing hallucination in extraction tasks with varied formats
Showing model what to accept vs flag (false positive reduction)

Structure of good few-shot examples:

2–4 targeted examples covering ambiguous cases
Show reasoning: why this choice over plausible alternatives
Demonstrate desired output format (location, issue, severity, fix)
Include varied document structures to enable generalization

Structured Output via Tool Use

4.3

KEY RULE

tool_use + JSON schema = most reliable structured output. Eliminates syntax errors, but NOT semantic errors (line items not summing, values in wrong fields).

Make fields optional/nullable when source may not contain that data — prevents fabrication
Use "other" + detail field pattern for extensible enum categories
Add "unclear" enum value for genuinely ambiguous cases

Validation, Retry & Batch

4.4 4.5

RETRY WORKS FOR

Format mismatches. Structural output errors. Schema compliance failures. Include: original doc + failed extraction + specific validation error.

RETRY WON'T FIX

Information absent from source document. Data only in an external doc not provided. Detect and route to human review — don't loop indefinitely.

Batch API for

Overnight reports, weekly audits, nightly test generation. 50% cost savings, up to 24h window.

NOT batch for

Blocking pre-merge checks. Any workflow requiring fast response. Multi-turn tool calling (not supported).

Context Window Management

5.1

⚠

LOST IN THE MIDDLE

Models reliably process info at the beginning and end of long inputs. Middle sections get attention gaps. Put key findings summaries first. Use explicit section headers for detailed middle content.

STRUCTURED FACT EXTRACTION

Extract transactional facts (amounts, dates, order numbers) into a persistent "case facts" block included in each prompt — outside summarized history.

TRIM VERBOSE TOOL OUTPUTS

Tool results accumulate and consume tokens disproportionately. Keep only relevant fields (e.g., 5 return-relevant fields from an order lookup with 40+ fields).

Escalation Decision-Making

5.2

Escalate when:

Customer explicitly requests a human — honor immediately , no investigation first
Policy is ambiguous or silent on the specific request
Unable to make meaningful progress on the case

Unreliable triggers — don't use:

Sentiment analysis / frustration detection — doesn't correlate with case complexity
Self-reported LLM confidence scores — poorly calibrated on hard cases
"Case complexity" heuristics without explicit criteria

Error Propagation

5.3

Structured error context to coordinator must include:

Failure type and what was attempted (query, parameters)
Any partial results obtained
Potential alternative approaches

Anti-pattern: Silent suppression

Returning empty results as "success" prevents recovery and risks incomplete outputs.

Anti-pattern: Full termination

Terminating entire workflow on one subagent failure. Handle locally; propagate with context when not.

Large Codebase & Human Review

5.4 5.5 5.6

SCRATCHPAD FILES

Persist key findings across context boundaries. Agents reference scratchpad for subsequent questions to counteract context degradation.

CRASH RECOVERY

Each agent exports state to a known location. Coordinator loads a manifest on resume and injects state into agent prompts.

PROVENANCE & ATTRIBUTION

Subagents must output structured claim-source mappings (URL, doc name, excerpt). Synthesis agents must preserve attribution. Conflicting statistics: annotate with source, don't arbitrarily pick one.

✦ Quick Rules

GOLDEN RULES

✓ Programmatic enforcement beats prompt instructions for anything with financial or safety consequences.

✓ Tool descriptions are the first lever for tool selection problems — before few-shot, before routing layers.

✓ Few-shot beats instruction refinement when output format or judgment is inconsistent.

✓ Explicit criteria beats vague modifiers ("be conservative") for precision/false positive problems.

✓ Batch API only for latency-tolerant workloads. Never for blocking workflows.

✓ Independent review instance beats self-review instructions or extended thinking for catching subtle bugs.

✓ Plan mode for architectural/multi-file changes. Direct execution for scoped, clear changes.

✓ Honor explicit customer escalation immediately — no investigation first, no "let me try to help."

✓ Structured error context (failure type + partial results + alternatives) over generic status strings.

✓ Project-level = version-controlled = shared. User-level = personal, invisible to teammates.

✓ Subagents need explicit context. They don't inherit coordinator history automatically.

✓ Nullable fields prevent hallucination when source documents may not contain required information.

COMMON DISTRACTORS TO REJECT

Sentiment analysis or LLM confidence scores for escalation routing — poorly calibrated.

Routing classifiers / keyword parsing layers in front of tools — over-engineered when descriptions are the actual fix.

Arbitrary iteration caps as the primary loop-stopping mechanism.

Larger context windows to fix attention quality in large reviews — split into focused passes instead.

Generic error status ("search unavailable") — hides recovery context from the coordinator.

Consensus-required flagging across multiple passes — suppresses real bugs that only appear intermittently.

Consolidating tools to fix misrouting — separate them with better descriptions first.

User-level CLAUDE.md for team-wide instructions — teammates don't get them.

Batch API for pre-merge checks — no latency SLA, blocks developers from merging.

Catching all timeouts and returning empty results as success — prevents coordinator from recovering.

EXAM LOGISTICS

Format MCQ · 4 options · 1 correct

Scenarios 4 drawn randomly from 6

Score Scaled 100–1000

Pass 720 / 1000

Penalty None — guess everything

Passing Score

720 / 1000