Skip to content

MCP Server Evals with MCPJam

As MCP (Model Context Protocol) become the backbone for agent tooling, prompts, and resources, testing them is no longer a simple unit-testing exercise.

Your MCP server is used:

  • by different clients (Claude Desktop, Cursor, IDE agents, internal tools)
  • with different LLMs (Anthropic, OpenAI, Copilot, local models)
  • across different environments (local, CI, staging, production)

To ship reliable MCP servers, you need confidence that tools, prompts, and resources behave consistently across all of these dimensions.

This guide shows how to combine MCPJam and Inspectr to achieve exactly that.


Even if your MCP server works in development, real-world usage introduces variability:

  • Tools may be triggered differently depending on the model
  • Prompts may be interpreted differently by different LLMs
  • Token usage can change drastically between flows
  • Some failures only surface in specific environments or clients

Traditional tests don’t answer questions like:

  • Which tools were actually used by the LLMs?
  • How many tokens were consumed, and where?
  • Why did an eval fail only for one model?
  • Did a recent change silently alter the LLM behavior?

To answer these questions, you need end-to-end testing and MCP visibility.


MCPJam is a developer-focused toolchain for evaluating MCP (Model Context Protocol) servers. It lets you define eval scenarios, run them against MCP servers, and measure correctness and behavior across tools, prompts, and resources.

With MCPJam, you define eval scenarios that simulate real user interactions:

  • tool calls
  • prompt exchanges
  • resource access
  • multi-step flows

These evals validate that your MCP server behaves correctly across:

  • different models
  • different environments
  • different user flows

The MCPJam CLI performs full E2E testing:

  • it simulates a real MCP client
  • it communicates with your MCP server over HTTP
  • it executes popular and critical user journeys

MCPJam answers the question: “Do LLMs use my MCP server as expected?”


What MCPJam intentionally does not do is MCP usage inspection. It focuses on evaluation outcomes, not actual usage observability.

Eval results alone don’t explain why something worked—or didn’t.

After a test run, you often still need to know:

  • which tools all were invoked
  • how prompts were processed
  • where tokens were spent
  • how behavior differed between runs

This is where runtime inspection becomes essential.

That’s where Inspectr fits in.

Inspectr is a local-first inspection and proxy tool for APIs, webhooks, and MCP traffic. It captures requests and responses in real time, enriches them with protocol-aware insights, and allows exporting full sessions as JSON for later analysis.

Used together, MCPJam + Inspectr give you both:

  • eval results
  • deep visibility into how your MCP server behaved during those evals

MCPJam and Inspectr solve complementary parts of the same problem.

  • MCPJam performs deterministic MCP evals and E2E testing
  • Inspectr observes and analyzes MCP traffic in real time

Inspectr runs transparently as a proxy between MCPJam and your MCP server:

Inspectr provides:

  • full capture of MCP HTTP requests and responses
  • understands MCP traffic patterns
  • classification of MCP calls (tools, prompts, resources)
  • token usage estimates per operation
  • exportable JSON artifacts for every eval run

Before you start, make sure you have:


Step 1 — Launch Inspectr with your MCP server

Section titled “Step 1 — Launch Inspectr with your MCP server”

Run Inspectr as a local proxy with export enabled:

Terminal window
inspectr \
--backend http://localhost:3000 \
--export

Why this matters:

  • Inspectr captures all MCP traffic
  • --export automatically writes a JSON archive file when Inspectr stops
  • Ideal for CI/CD and reproducible eval runs

Install or use the MCPJam CLI:

Terminal window
# One-off usage
npx @mcpjam/cli --help

Want a working example to copy from? The starter repo includes a small, runnable eval setup:

Terminal window
git clone https://github.com/MCPJam/evals-cli-starter.git
cd evals-cli-starter

Use the starter repo as a reference for eval definitions and environment config, then copy those patterns into your own project.

For deeper context, see the MCPJam evals overview and the evals CLI starter repo.

Step 3 — Configure MCPJam to use Inspectr

Section titled “Step 3 — Configure MCPJam to use Inspectr”

Update your MCPJam environment configuration to target Inspectr instead of your MCP server directly:

{
"mcpServers": {
"my-server": {
"url": "http://localhost:8080/mcp",
"headers": {
"Authorization": "Bearer ${API_TOKEN}"
}
}
}
}

No changes are required on your MCP server.


Run your evals as usual:

Terminal window
npx @mcpjam/cli evals run -t weather-tests.json -e local-dev.json

While the eval runs:

  • Inspectr shows MCP traffic live
  • tools, prompts, and resources are classified
  • token usage is estimated per operation

When the eval completes, stop Inspectr.

Because export mode was enabled:

  • Inspectr writes a JSON export automatically
  • the export can be archived as a CI artifact
  • the run can be re-imported later for inspection or comparison

Each eval becomes a durable, inspectable record.


Optional — Add a run-level tracing header

Section titled “Optional — Add a run-level tracing header”

To correlate all requests belonging to a single eval run, add a tracing header:

"headers": {
"Authorization": "Bearer ${API_TOKEN}",
"x-correlation-id": "${EVAL_RUN_ID}"
}

Run your eval with a unique identifier:

Terminal window
EVAL_RUN_ID="mcpjam-$(date +%Y%m%d-%H%M%S)" \
npx @mcpjam/cli evals run -t weather-tests.json -e local-dev.json

This makes it easy to filter and compare runs later.


Optional — Run MCP evals with Inspectr Command Runner

Section titled “Optional — Run MCP evals with Inspectr Command Runner”

When running MCP evals locally or in CI, you may want Inspectr to start, run, and stop automatically as part of a single command.

Inspectr’s Command Runner launches Inspectr, starts your eval command as a managed process, and shuts everything down when the command completes — while still capturing MCP traffic, tracing, and exports.

Terminal window
inspectr \
--backend http://localhost:3000 \
--export \
--command npx \
--command-arg @mcpjam/cli \
--command-arg evals \
--command-arg run \
--command-arg -t \
--command-arg weather-tests.json \
--command-arg -e \
--command-arg local-dev.json

In this setup: • Inspectr starts automatically • MCPJam is launched by Inspectr using —command and —command-arg • all MCP traffic is captured and traced • a JSON export is written when the command completes

---
## Summary
Testing MCP servers across models, clients, and environments requires more than pass/fail checks.
By combining:
- **MCPJam** for E2E MCP evals
- **Inspectr** for runtime visibility and exports
you get:
- confidence that your MCP server works the same with the different models
- visibility into MCP runtime behavior
- token and call-level insights per eval run
- portable, re-importable eval artifacts
- zero changes to your MCP server
This setup scales cleanly from local development to CI-based MCP regression testing.
Together, MCPJam and Inspectr form a complete MCP testing and observability workflow.