Discoverable
Can an LLM enumerate the tools, read their descriptions, and understand what the server does without opening the source?
leaderboard.delx.ai ยท open source ยท auto-refreshed weekly
A public, neutral ranking of the entire MCP ecosystem. Every npm server in the official registry is booted over stdio and graded by mcp-scorecard on 10 agent-readiness checks for a 0-100 score. Downloads measure popularity. This measures whether an agent can actually pick your server up.
npx -y mcp-scorecard <your-package-or-repo>
The live ranking
Every server here was booted in its own child process and graded on 10 checks. Sorted by agent-readiness, highest first. The + green tags are passing checks; the โ red tags are the gaps holding the score down.
| # | MCP server (npm) | score | agent-readiness | key passing / failing checks |
|---|---|---|---|---|
| 1 | @636865636b73756d/mcp-v1server: cs-mcp | 48 | ||
| 2 | @ai-dossier/mcp-serverserver: @ai-dossier/mcp-server | 48 | ||
| 3 | @agentled/mcp-serverserver: agentled | 47 | ||
| 4 | @agonx402/gateway-mcpserver: agon-gateway-mcp | 45 | ||
| 5 | @agonx402/protocol-mcpserver: agon-protocol-mcp | 45 | ||
| 6 | @adeu/mcp-serverserver: adeu-redlining-service | 43 | ||
| 7 | @2sio/mcpserver: 2sio | 37 | ||
| 8 | @10iii/air-mcp-serverserver: air | 35 | ||
| 9 | @agent360/browser-mcpserver: agent360-browser | 30 | ||
| 10 | @adbutler/mcp-serverserver: adbutler | 29 | ||
| 11 | @1ly/mcp-serverserver: 1ly | 27 |
4 servers were unreachable this run and are intentionally not scored:
@1claw/mcp, @agenium/mcp-server, @agenttrust/mcp-server and
@aimino.xdn/fast-html-mcp-server. Servers that require auth before listTools() and
don't honor the MCP_PROBE hook can't be graded fairly, so they are listed as unreachable, not
penalized. Support MCP_PROBE to become gradeable.
Why this exists
Hundreds of MCP servers ship every month with no shared measure of whether an agent can actually pick one up cleanly. Downloads measure popularity, not quality.
Can an LLM enumerate the tools, read their descriptions, and understand what the server does without opening the source?
Valid JSON schemas, snake_case tool names, read-only annotations and gated mutations make a server safe to wire into an agent.
An *_agent_manifest / *_capabilities surface and a smoke test let an agent verify the server before it relies on it.
Methodology & fairness
Nothing here is hand-edited. The corpus comes from the official registry, every target is graded by the same engine in isolation, and a weekly GitHub Action commits the result.
The weekly pipeline
data/corpus.json.data/leaderboard.json.LEADERBOARD.md and serve this page + JSON.Climb the board
The engine is the same CLI that built this board. Run it locally against your own package or repo, read the itemized fixes, ship them, and climb on the next weekly refresh.
Point mcp-scorecard at any npm package, GitHub URL, or local dist/index.js. It boots the server and prints your 0-100 score with the exact failing checks.
snake_case tool names, read-only annotations, an *_agent_manifest / *_capabilities discovery surface, documented privacy modes, and a smoke test. These are what the score rewards.
No credentials, runs locally over stdio. Add --json to pipe into CI, or --min-score 80 to gate a build.
npx -y mcp-scorecard <your-package-or-repo>
Generate a live badge so agents and humans see your agent-readiness at a glance, straight from your repo.
npx -y mcp-scorecard <your-package> --badge
mcp-scorecard is also an MCP server, so an agent can grade other servers on demand. Add it to any client that speaks MCP over stdio.
{
"mcpServers": {
"mcp-scorecard": {
"command": "npx",
"args": ["-y", "mcp-scorecard"]
}
}
}
The 10 checks
Each check scores 0-10 and they sum to 100. Together they describe whether an agent can find, trust and safely use the server.
A standard, fetchable description of the server an agent can read first.
Consistent, snake_case, namespaced tool names an agent can predict.
Every tool explains what it does, its inputs, and its side effects.
Input schemas are valid JSON Schema an agent can rely on.
Read-only / destructive / idempotent hints so agents act safely.
Write and destructive actions are explicit, not accidental.
Sensitive output is opt-in and documented, not leaked by default.
The server advertises resources, not just tools, where it makes sense.
An *_agent_manifest / *_capabilities surface for self-onboarding.
A safe, callable check that proves the server is actually alive.
Questions people ask first
A neutral leaderboard should be explicit about what it measures, how it scores, and where it is deliberately fair.
Agent-readiness: whether an agent can discover the tools, trust the schemas, and onboard itself without a human reading the source. It is shape, metadata and discoverability โ not correctness or security.
Each npm server from the official registry is booted over stdio in an isolated child process and graded by mcp-scorecard on 10 checks (each 0-10) summed to 0-100. A weekly GitHub Action re-runs the sweep and commits the result.
Run npx -y mcp-scorecard <your-package-or-repo>, read the itemized fixes, and adopt the conventions the score rewards. You climb on the next weekly refresh โ nothing is hand-edited.
Servers that need auth before listTools() and don't honor the MCP_PROBE hook can't be graded fairly, so they're listed as unreachable rather than penalized. Support MCP_PROBE to be gradeable.
For AI agents
Hand this to your assistant to score a server and read back exactly what to fix. It points at the engine, runs it over stdio, and never invents a score.
Score the agent-readiness of an MCP server for me using
mcp-scorecard.
Engine: github.com/davidmosiah/mcp-scorecard (npm: mcp-scorecard)
Leaderboard: https://leaderboard.delx.ai
Data: https://leaderboard.delx.ai/leaderboard.json
Run: npx -y mcp-scorecard <TARGET> --json
(<TARGET> = npm package, GitHub URL, or local dist/index.js)
Then give me:
1. my 0-100 score and grade
2. all 10 checks with pass / warn / fail
3. the top 3 gaps, ranked by points lost
4. a concrete fix for each
Don't fabricate a score. If it's unreachable, say so
and explain the MCP_PROBE hook.
For builders
Both the leaderboard (mcp-leaderboard) and the scoring engine (mcp-scorecard) are MIT and built by David Mosiah. Methodology issues and corpus additions are welcome โ the rules live in the scripts, not in anyone's head.
The agent-readiness bar
Score it in one command, fix what the engine flags, and climb on the next weekly refresh.