Deterministic AI Testing with Session Recording in cagent

AI agents introduce a challenge that traditional software doesn’t have: non-determinism. The same prompt can produce different outputs across runs, making reliable testing difficult. Add API costs and latency to the mix, and developer productivity takes a hit.

Session recording in cagent addresses this directly. Record an AI interaction once, replay it indefinitely—with identical results, zero API costs, and millisecond execution times.

How session recording works

cagent implements the VCR pattern, a proven approach for HTTP mocking. During recording, cagent proxies requests to the AI provider, captures the full request/response cycle, and saves it to a YAML “cassette” file. During replay, incoming requests are matched against the recording and served from cache—no network calls required.

Getting started

Recording a session requires a single flag:

cagent run my-agent.yaml --record "What is Docker?"
# creates: cagent-recording-1736089234.yaml

cagent run my-agent.yaml --record="my-test" "Explain containers"
# creates: my-test.yaml

Replaying uses the --fake flag with the cassette path:

cagent exec my-agent.yaml --fake my-test "Explain containers"

The replay completes in milliseconds with no API calls.

One implementation detail worth noting: tool call IDs are normalized before matching. OpenAI generates random IDs on each request, which would otherwise break replay. cagent handles this automatically.

Example: CI/CD integration testing

Consider a code review agent:

# code-reviewer.yaml
agents:
  root:
  model: anthropic/claude-sonnet-4-0
  description: Code review assistant
  instruction: |
	  You are an expert code reviewer. Analyze code for best practices,
	  security issues, performance concerns, and readability.
  toolsets:
  - type: filesystem

Record the interaction with --yolo to auto-approve tool calls:

cagent exec code-reviewer.yaml --record="code-review" --yolo \\
  "Review pkg/auth/handler.go for security issues"

In CI, replay without API keys or network access:

cagent exec code-reviewer.yaml --fake code-review \\
  "Review pkg/auth/handler.go for security issues"

Cassettes can be version-controlled alongside test code. When agent instructions change significantly, delete the cassette and re-record to capture the new behaviour.

Other use cases

Cost-effective prompt iteration. Record a single interaction with an expensive model, then iterate on agent configuration against that recording. The first run incurs API costs; subsequent iterations are free.

cagent exec ./agent.yaml --record="expensive-test" "Complex task"
for i in {1..100}; do
  cagent exec ./agent-v$i.yaml --fake expensive-test "Complex task"
done

Issue reproduction. Users can record a session with --record bug-report and share the cassette file. Support teams replay the exact interaction locally for debugging.

Multi-agent systems. Recording captures the complete delegation graph: root agent decisions, sub-agent tool calls, and inter-agent communication.

Security and provider support

Cassettes automatically strip sensitive headers (Authorization, X-Api-Key) before saving, making them safe to commit to version control. The format is human-readable YAML:

version: 2
interactions:
  - id: 0
    request:
      method: POST
      url: &lt;https://api.openai.com/v1/chat/completions&gt;
      body: "{...}"
    response:
      status: 200 OK
      body: "data: {...}"

Session recording works with all supported providers: OpenAI, Anthropic, Google, Mistral, xAI, and Nebius.

Get started

Session recording is available now in cagent. To try it:

cagent run ./your-agent.yaml --record="my-session" "Your prompt here"

For questions, feedback, or feature requests, visit the cagent repository or join the GitHub Discussions.

Deterministic AI Testing with Session Recording in cagent

How session recording works

Getting started

Example: CI/CD integration testing

Other use cases

Security and provider support

Get started

Related Posts

Announcing Docker Hardened System Packages

Celebrating Women in AI: 3 Questions with Cecilia Liu on Leading Docker’s MCP Strategy

Docker Model Runner Brings vLLM to macOS with Apple Silicon

Products

Features

Developers

Pricing

Company

Languages