MCP Server Evals with MCPJam
Introduction
Section titled “Introduction”As MCP (Model Context Protocol) become the backbone for agent tooling, prompts, and resources, testing them is no longer a simple unit-testing exercise.
Your MCP server is used:
- by different clients (Claude Desktop, Cursor, IDE agents, internal tools)
- with different LLMs (Anthropic, OpenAI, Copilot, local models)
- across different environments (local, CI, staging, production)
To ship reliable MCP servers, you need confidence that tools, prompts, and resources behave consistently across all of these dimensions.
This guide shows how to combine MCPJam and Inspectr to achieve exactly that.
The Problem: Testing MCP Servers
Section titled “The Problem: Testing MCP Servers”Even if your MCP server works in development, real-world usage introduces variability:
- Tools may be triggered differently depending on the model
- Prompts may be interpreted differently by different LLMs
- Token usage can change drastically between flows
- Some failures only surface in specific environments or clients
Traditional tests don’t answer questions like:
- Which tools were actually used by the LLMs?
- How many tokens were consumed, and where?
- Why did an eval fail only for one model?
- Did a recent change silently alter the LLM behavior?
To answer these questions, you need end-to-end testing and MCP visibility.
MCP Evals with MCPJam
Section titled “MCP Evals with MCPJam”MCPJam is a developer-focused toolchain for evaluating MCP (Model Context Protocol) servers. It lets you define eval scenarios, run them against MCP servers, and measure correctness and behavior across tools, prompts, and resources.
With MCPJam, you define eval scenarios that simulate real user interactions:
- tool calls
- prompt exchanges
- resource access
- multi-step flows
These evals validate that your MCP server behaves correctly across:
- different models
- different environments
- different user flows
End-to-End (E2E) testing by design
Section titled “End-to-End (E2E) testing by design”The MCPJam CLI performs full E2E testing:
- it simulates a real MCP client
- it communicates with your MCP server over HTTP
- it executes popular and critical user journeys
MCPJam answers the question: “Do LLMs use my MCP server as expected?”
MCP Tool Usage Visibility
Section titled “MCP Tool Usage Visibility”What MCPJam intentionally does not do is MCP usage inspection. It focuses on evaluation outcomes, not actual usage observability.
Eval results alone don’t explain why something worked—or didn’t.
After a test run, you often still need to know:
- which tools all were invoked
- how prompts were processed
- where tokens were spent
- how behavior differed between runs
This is where runtime inspection becomes essential.
That’s where Inspectr fits in.
Inspectr is a local-first inspection and proxy tool for APIs, webhooks, and MCP traffic. It captures requests and responses in real time, enriches them with protocol-aware insights, and allows exporting full sessions as JSON for later analysis.
Used together, MCPJam + Inspectr give you both:
- eval results
- deep visibility into how your MCP server behaved during those evals
The Solution: MCPJam + Inspectr
Section titled “The Solution: MCPJam + Inspectr”MCPJam and Inspectr solve complementary parts of the same problem.
- MCPJam performs deterministic MCP evals and E2E testing
- Inspectr observes and analyzes MCP traffic in real time
Inspectr runs transparently as a proxy between MCPJam and your MCP server:
Inspectr provides:
- full capture of MCP HTTP requests and responses
- understands MCP traffic patterns
- classification of MCP calls (tools, prompts, resources)
- token usage estimates per operation
- exportable JSON artifacts for every eval run
Prerequisites
Section titled “Prerequisites”Before you start, make sure you have:
- A MCP server running
- Node installed
- Inspectr installed (Installation guide →)
Step 1 — Launch Inspectr with your MCP server
Section titled “Step 1 — Launch Inspectr with your MCP server”Run Inspectr as a local proxy with export enabled:
inspectr \ --backend http://localhost:3000 \ --exportWhy this matters:
- Inspectr captures all MCP traffic
--exportautomatically writes a JSON archive file when Inspectr stops- Ideal for CI/CD and reproducible eval runs
Step 2 — Setup MCPJam CLI
Section titled “Step 2 — Setup MCPJam CLI”Install or use the MCPJam CLI:
# One-off usagenpx @mcpjam/cli --help# Install globally to get the `mcpjam` commandnpm install -g @mcpjam/cliWant a working example to copy from? The starter repo includes a small, runnable eval setup:
git clone https://github.com/MCPJam/evals-cli-starter.gitcd evals-cli-starterUse the starter repo as a reference for eval definitions and environment config, then copy those patterns into your own project.
For deeper context, see the MCPJam evals overview and the evals CLI starter repo.
Step 3 — Configure MCPJam to use Inspectr
Section titled “Step 3 — Configure MCPJam to use Inspectr”Update your MCPJam environment configuration to target Inspectr instead of your MCP server directly:
{ "mcpServers": { "my-server": { "url": "http://localhost:8080/mcp", "headers": { "Authorization": "Bearer ${API_TOKEN}" } } }}No changes are required on your MCP server.
Step 4 — Run MCPJam evals
Section titled “Step 4 — Run MCPJam evals”Run your evals as usual:
npx @mcpjam/cli evals run -t weather-tests.json -e local-dev.jsonWhile the eval runs:
- Inspectr shows MCP traffic live
- tools, prompts, and resources are classified
- token usage is estimated per operation
Step 5 — Export and analyze the run
Section titled “Step 5 — Export and analyze the run”When the eval completes, stop Inspectr.
Because export mode was enabled:
- Inspectr writes a JSON export automatically
- the export can be archived as a CI artifact
- the run can be re-imported later for inspection or comparison
Each eval becomes a durable, inspectable record.
Optional — Add a run-level tracing header
Section titled “Optional — Add a run-level tracing header”To correlate all requests belonging to a single eval run, add a tracing header:
"headers": { "Authorization": "Bearer ${API_TOKEN}", "x-correlation-id": "${EVAL_RUN_ID}"}Run your eval with a unique identifier:
EVAL_RUN_ID="mcpjam-$(date +%Y%m%d-%H%M%S)" \npx @mcpjam/cli evals run -t weather-tests.json -e local-dev.jsonThis makes it easy to filter and compare runs later.
Learn more about Tracing Insights
Optional — Run MCP evals with Inspectr Command Runner
Section titled “Optional — Run MCP evals with Inspectr Command Runner”When running MCP evals locally or in CI, you may want Inspectr to start, run, and stop automatically as part of a single command.
Inspectr’s Command Runner launches Inspectr, starts your eval command as a managed process, and shuts everything down when the command completes — while still capturing MCP traffic, tracing, and exports.
Example: Run MCPJam evals via Inspectr
Section titled “Example: Run MCPJam evals via Inspectr”inspectr \ --backend http://localhost:3000 \ --export \ --command npx \ --command-arg @mcpjam/cli \ --command-arg evals \ --command-arg run \ --command-arg -t \ --command-arg weather-tests.json \ --command-arg -e \ --command-arg local-dev.jsonIn this setup: • Inspectr starts automatically • MCPJam is launched by Inspectr using —command and —command-arg • all MCP traffic is captured and traced • a JSON export is written when the command completes
---
## Summary
Testing MCP servers across models, clients, and environments requires more than pass/fail checks.
By combining:- **MCPJam** for E2E MCP evals- **Inspectr** for runtime visibility and exports
you get:- confidence that your MCP server works the same with the different models- visibility into MCP runtime behavior- token and call-level insights per eval run- portable, re-importable eval artifacts- zero changes to your MCP server
This setup scales cleanly from local development to CI-based MCP regression testing.
Together, MCPJam and Inspectr form a complete MCP testing and observability workflow.