leaderboard.delx.ai ยท open source ยท auto-refreshed weekly

Agent-readiness,ranked.

A public, neutral ranking of the entire MCP ecosystem. Every npm server in the official registry is booted over stdio and graded by mcp-scorecard on 10 agent-readiness checks for a 0-100 score. Downloads measure popularity. This measures whether an agent can actually pick your server up.

npx -y mcp-scorecard <your-package-or-repo>
  • 11servers scored this run
  • 0-100agent-readiness score
  • 10checks per server
  • Weeklyauto-refreshed by CI
Seed run. This board currently shows the first 11 scored servers of the registry corpus. The weekly CI sweep widens the field every refresh, including the wellness MCPs that score 90+ on the engine's own board. Scope: agent-readiness shape and metadata, not correctness or security.
Live boardleaderboard.delx.ai SourceGitHub repo Enginemcp-scorecard CorpusMCP registry

The live ranking

The whole board, by score.

Every server here was booted in its own child process and graded on 10 checks. Sorted by agent-readiness, highest first. The + green tags are passing checks; the โ€“ red tags are the gaps holding the score down.

engine mcp-scorecard generated 2026-06-14 scored 11 corpus this run 15 unreachable 4
# MCP server (npm) score agent-readiness key passing / failing checks
1 @636865636b73756d/mcp-v1server: cs-mcp 48
+ Schema validity + Mutation gating + Tool descriptions – Privacy modes documented – Agent manifest
2 @ai-dossier/mcp-serverserver: @ai-dossier/mcp-server 48
+ Schema validity + Resources advertised + Tool descriptions – Privacy modes documented – Mutation gating
3 @agentled/mcp-serverserver: agentled 47
+ Schema validity + Tool descriptions – Mutation gating – Agent manifest
4 @agonx402/gateway-mcpserver: agon-gateway-mcp 45
+ Schema validity + Tool naming convention + Mutation gating – Privacy modes documented – Agent manifest
5 @agonx402/protocol-mcpserver: agon-protocol-mcp 45
+ Schema validity + Tool naming convention + Mutation gating – Privacy modes documented – Agent manifest
6 @adeu/mcp-serverserver: adeu-redlining-service 43
+ Schema validity + Tool descriptions – Privacy modes documented – Mutation gating
7 @2sio/mcpserver: 2sio 37
+ Schema validity + Mutation gating + Tool descriptions – Tool naming convention – Agent manifest
8 @10iii/air-mcp-serverserver: air 35
+ Schema validity + Tool naming convention + Mutation gating – Privacy modes documented – Agent manifest
9 @agent360/browser-mcpserver: agent360-browser 30
+ Schema validity + Tool naming convention + Tool descriptions – Privacy modes documented – Mutation gating
10 @adbutler/mcp-serverserver: adbutler 29
+ Schema validity – Mutation gating – Agent manifest
11 @1ly/mcp-serverserver: 1ly 27
+ Schema validity + Tool descriptions – Tool naming convention – Privacy modes documented

4 servers were unreachable this run and are intentionally not scored: @1claw/mcp, @agenium/mcp-server, @agenttrust/mcp-server and @aimino.xdn/fast-html-mcp-server. Servers that require auth before listTools() and don't honor the MCP_PROBE hook can't be graded fairly, so they are listed as unreachable, not penalized. Support MCP_PROBE to become gradeable.

Why this exists

A neutral signal for agent-readiness.

Hundreds of MCP servers ship every month with no shared measure of whether an agent can actually pick one up cleanly. Downloads measure popularity, not quality.

01

Discoverable

Can an LLM enumerate the tools, read their descriptions, and understand what the server does without opening the source?

02

Trustworthy shape

Valid JSON schemas, snake_case tool names, read-only annotations and gated mutations make a server safe to wire into an agent.

03

Self-onboarding

An *_agent_manifest / *_capabilities surface and a smoke test let an agent verify the server before it relies on it.

Methodology & fairness

Booted, not guessed.

Nothing here is hand-edited. The corpus comes from the official registry, every target is graded by the same engine in isolation, and a weekly GitHub Action commits the result.

Real corpus npm-installable servers from the official MCP registry, latest active version, deduped.
Isolated runs Each server boots in its own child process with a hard timeout. One hang can't stall the batch.
Fair to auth-gated servers Unreachable ≠ bad. Servers that can't be probed without credentials are listed, not low-scored.
Scope is honest This grades shape, metadata and discoverability. A server can score 100 and still return wrong data.

The weekly pipeline

  1. Corpus โ€” pull npm-installable servers from the MCP registry into data/corpus.json.
  2. Run โ€” boot + score each target with mcp-scorecard into data/leaderboard.json.
  3. Render โ€” generate LEADERBOARD.md and serve this page + JSON.
  4. Commit โ€” a GitHub Action commits the refreshed board. No human in the loop.

Climb the board

Score your server, then fix what it tells you.

The engine is the same CLI that built this board. Run it locally against your own package or repo, read the itemized fixes, ship them, and climb on the next weekly refresh.

Step 1

Get your score

Point mcp-scorecard at any npm package, GitHub URL, or local dist/index.js. It boots the server and prints your 0-100 score with the exact failing checks.

Step 2

Adopt the conventions

snake_case tool names, read-only annotations, an *_agent_manifest / *_capabilities discovery surface, documented privacy modes, and a smoke test. These are what the score rewards.

Score

One command, itemized fixes

No credentials, runs locally over stdio. Add --json to pipe into CI, or --min-score 80 to gate a build.

npx -y mcp-scorecard <your-package-or-repo>
Badge

Show your score in your README

Generate a live badge so agents and humans see your agent-readiness at a glance, straight from your repo.

npx -y mcp-scorecard <your-package> --badge
Run it as MCP

Wire the engine into your agent

mcp-scorecard is also an MCP server, so an agent can grade other servers on demand. Add it to any client that speaks MCP over stdio.

Claude Cursor Windsurf Hermes OpenClaw Generic MCP
Show the MCP config
{
  "mcpServers": {
    "mcp-scorecard": {
      "command": "npx",
      "args": ["-y", "mcp-scorecard"]
    }
  }
}

The 10 checks

What the score is made of.

Each check scores 0-10 and they sum to 100. Together they describe whether an agent can find, trust and safely use the server.

1

Manifest discoverability

A standard, fetchable description of the server an agent can read first.

2

Tool naming

Consistent, snake_case, namespaced tool names an agent can predict.

3

Tool descriptions

Every tool explains what it does, its inputs, and its side effects.

4

Schema validity

Input schemas are valid JSON Schema an agent can rely on.

5

Annotations

Read-only / destructive / idempotent hints so agents act safely.

6

Mutation gating

Write and destructive actions are explicit, not accidental.

7

Privacy modes

Sensitive output is opt-in and documented, not leaked by default.

8

Resources

The server advertises resources, not just tools, where it makes sense.

9

Agent manifest

An *_agent_manifest / *_capabilities surface for self-onboarding.

10

Smoke test

A safe, callable check that proves the server is actually alive.

Questions people ask first

Before you trust the ranking.

A neutral leaderboard should be explicit about what it measures, how it scores, and where it is deliberately fair.

What does this actually measure?

Agent-readiness: whether an agent can discover the tools, trust the schemas, and onboard itself without a human reading the source. It is shape, metadata and discoverability โ€” not correctness or security.

How is each server scored?

Each npm server from the official registry is booted over stdio in an isolated child process and graded by mcp-scorecard on 10 checks (each 0-10) summed to 0-100. A weekly GitHub Action re-runs the sweep and commits the result.

How do I climb the board?

Run npx -y mcp-scorecard <your-package-or-repo>, read the itemized fixes, and adopt the conventions the score rewards. You climb on the next weekly refresh โ€” nothing is hand-edited.

Why is a server "unreachable" not "low"?

Servers that need auth before listTools() and don't honor the MCP_PROBE hook can't be graded fairly, so they're listed as unreachable rather than penalized. Support MCP_PROBE to be gradeable.

For AI agents

A delegation prompt that does the work.

Hand this to your assistant to score a server and read back exactly what to fix. It points at the engine, runs it over stdio, and never invents a score.

Score the agent-readiness of an MCP server for me using
mcp-scorecard.

Engine: github.com/davidmosiah/mcp-scorecard (npm: mcp-scorecard)
Leaderboard: https://leaderboard.delx.ai
Data: https://leaderboard.delx.ai/leaderboard.json

Run:  npx -y mcp-scorecard <TARGET> --json
(<TARGET> = npm package, GitHub URL, or local dist/index.js)

Then give me:
1. my 0-100 score and grade
2. all 10 checks with pass / warn / fail
3. the top 3 gaps, ranked by points lost
4. a concrete fix for each

Don't fabricate a score. If it's unreachable, say so
and explain the MCP_PROBE hook.

For builders

Open source, end to end.

Both the leaderboard (mcp-leaderboard) and the scoring engine (mcp-scorecard) are MIT and built by David Mosiah. Methodology issues and corpus additions are welcome โ€” the rules live in the scripts, not in anyone's head.

The agent-readiness bar

See where your server lands.

Score it in one command, fix what the engine flags, and climb on the next weekly refresh.