Stop guessing if your AI tools work

AgentJury tests MCP servers and agent skills the way agents actually use them — with fuzzy inputs, weird edge cases, and no hand-holding. See which tools pass and which break.

How it works

1

Tool goes into a sandbox

Isolated Docker container. No internet access. Resource-limited. Your tool runs exactly like it would in production, minus the ability to phone home.

2

Five test agents hammer it

Security fuzzing, 100-call reliability runs, "can an agent figure this out" tests, and cross-framework compatibility checks. All automated, all recorded.

3

You get data, not opinions

Test results with evidence. Failure modes. Compatibility matrix. Every score links to the logs that produced it. No hand-waving.

107+ tools tested. MCP servers and agent skills. Every score links to its test data.

Security fuzzing
Reliability testing
Compatibility matrix
Open test data

Built an MCP server or skill? Test it.

Submit your GitHub URL and get a verdict in minutes. Scores, compatibility matrix, and security analysis — all automated.

Submit for testing