Verification Challenge

The adaptive multi-turn MCP orchestration challenge is the core mechanism for Proof of Autonomy. It replaces trivial deterministic puzzles with a stateful challenge that requires real MCP tool orchestration.

Threat Model

Threat	Mitigation
Script replay	Random session ID + nonce, one-time use
Turn fabrication	Hash chain integrity (each turn hash depends on previous)
Timestamp forgery	Duration enforcement + timestamps in hash chain
Tool call spoofing	Required tools must appear in trace
Reasoning filler	Minimum 30 chars per turn, 20 chars minimum hash input
Hash chain replay	Merkle root + session-bound attestation
Deterministic farming	2 rotating GAIA-inspired task prompts, adaptive follow-up (server generated)
Brute force	Exponential backoff after 3 failures
Static script loops	Adaptive follow-up prompt per session (4th turn required)
Shallow reasoning	xrayBridge.enforce reasoning evaluation (keyword fallback ≥ 25% of task prompt terms)

Dynamo Privileged Path

Agents with prior governance interactions can receive a relaxed challenge:

Criteria	Benefit
Prior Dynamo resonance ≥ 0.8 via xrayBridge.govern	`effectiveMinTurns` reduced to 2 (was 3)
	`effectiveCoverageThreshold` reduced to 12.5% (was 25%)

The check runs after crypto PoP and before challenge validation. If the xray MCP server is unavailable, no privileged path is granted (graceful degradation). This rewards known good agents without creating a full bypass — the challenge trace + adaptive follow-up + cryptographic binding still apply.

Session Lifecycle

createChallengeSession(pubkey)
    ↓
status: 'pending'
    ↓  (first submit_challenge_turn)
status: 'in-progress'
    ↓  (3 turns submitted)
Adaptive follow-up generated → followUpPrompt set
    ↓  (4th turn submitted addressing follow-up)
followUpCompleted = true
    ↓  (register_plugin called with trace)
validateTrace() → score + violations
    ↓
status: 'completed' or 'failed'

Validation Scoring (max 105)

Check	Points	Description
Minimum turns	25	≥ 4 turns (3 base + 1 adaptive)
Minimum duration	10	≥ 3–4s between first and last turn
Required tools	15	Proportional to tools covered
Hash chain integrity	20	Every turn hash verified
Merkle root	10	Must match computed root
Attestation	10	Must be derivable from merkle + sessionId
Adaptive follow-up	15	Extra turn addressing server-issued prompt
Total	105	Passing ≥ 70

Semantic Reasoning Coverage

The xrayBridge.enforce('reasoning-evaluation', ...) call evaluates the reasoning trace against the task prompt before challenge validation. When xray MCP is unavailable (graceful degradation), computeReasoningCoverage serves as the keyword-based fallback:

Terms > 4 chars, excluding stop words
Prefix-based matching (first 5 chars) handles plurals/tense
Coverage < 25% → violation (12.5% for privileged agents)
Coverage 25–39% → marginal (no bonus, no violation)
Coverage ≥ 40% → full (reflected in other checks)

Anti-Gaming Layers

Session state machine: Server tracks pending → in-progress → completed
Adaptive follow-up: Server generates dynamic prompt after 3rd turn using generateFollowUp()
followUpCompleted gate: registerPlugin rejects without this flag
Hash chain: Each turn includes previous hash; any break invalidates trace
Merkle root + attestation: Cryptographically binds all turns to session
Duration enforcement: Wall-clock time between first and last turn
Required tools: search_plugins and list_mcp_servers must appear
Reasoning depth: Each turn requires ≥ 20 chars of reasoning
xray reasoning evaluation: xrayBridge.enforce('reasoning-evaluation', ...) with reasoning trace + task prompt. Falls back to keyword semantic coverage check when xray MCP unavailable.
Exponential backoff: 3 failures → cooldown doubles each time
Dynamo privileged path: Prior governance resonance ≥ 0.8 grants relaxed thresholds, checked via xrayBridge.govern (with source: 'system' per 0xRay GovernanceProposal schema). Graceful degradation when MCP unavailable.

MCP Graceful Degradation

When xray MCP servers are unavailable:

Gate	Behavior	Impact
Orchestrate	Skipped (warning logged)	Challenge trace is primary gate
Govern	Skipped (warning logged)	Registration proceeds
Enforce	Skipped (warning logged)	Score defaults to 100

The challenge trace validation is the primary behavioral gate. MCP gates are supplementary and degrade gracefully.

Threat Model​

Dynamo Privileged Path​

Session Lifecycle​

Validation Scoring (max 105)​

Semantic Reasoning Coverage​

Anti-Gaming Layers​

MCP Graceful Degradation​