2026

June 29, 2026

There Is No Such Thing as a Verified Skill: Introducing trust-card

You type one command and a folder you did not write becomes part of how your agent thinks and acts. The only signal you usually get back is a green checkmark, and a checkmark proves the wrong thing. trust-card replaces that badge with graded evidence: who shipped these exact bytes, what the skill is allowed to do, where its knowledge came from, and who has independently vouched for it. It renders the result as a trading-card you can actually read, and it refuses to pretend that a signature proves an artifact is safe.

S

Sascha Becker

Author

13 min read

There Is No Such Thing as a Verified Skill: Introducing trust-card

You run npx skills add some/repo --skill thing, and a folder you did not write becomes part of how your agent reasons and acts. It might be a checklist. It might be a script your agent now runs. It might be a knowledge file that quietly shapes every answer downstream. Whatever it is, it joined your context, and the only signal most registries hand you back is a star count and maybe a green checkmark.

A checkmark is the wrong shape for this. It collapses several different questions into one boolean, and the question it usually answers is not the one you care about. trust-card is a new agent skill in my open set that replaces the badge with graded evidence, and renders that evidence as a card you can read at a glance.

What a skill actually is

An agent skill is a small, self-contained folder with a SKILL.md inside. The agent reads one field, the description, decides on its own whether the skill is relevant to your prompt, and loads the rest only when it needs it. Some skills are pure knowledge: a catalog, a set of rules, a per-tool notes file. Others ship code the agent will run. Most of mine are knowledge, with the occasional validator script.

That design is why skills travel so well. One folder works across Claude Code, Cursor, Codex, Cline, Windsurf, and OpenCode through the skills.sh format, with no SDK and no lock-in. It is also why trust is awkward. A skill is not a dependency you call on purpose with a known interface. It is context that decides, partly on its own, when to influence what your agent does next.

The problem nobody prints on the box

Pull a skill from a registry and ask the honest questions. Who shipped these exact bytes, and have they changed since? What is this thing allowed to do on my machine? Where did its claims come from? Has anyone other than the author looked at it? The usual answer to all four is a shrug.

Two threat models hide behind that shrug, and they are not the same.

For a skill that ships code, the risk is the obvious one: a script that reads more than it should, calls home, or runs a shell command you never saw. This is the pull_request_target class of problem moved one layer up, from your CI into your agent.

For a skill that is pure knowledge, the risk is quieter and arguably worse. A poisoned knowledge bundle opens no socket and runs no code. It just asserts something false with authority, and your agent repeats it. A single corrupted legal or medical concept does not crash anything. It silently bends every downstream answer. You cannot grep for that, and a sandbox will not catch it, because nothing misbehaves at the system level.

This is the part the green checkmark gets wrong. Cryptography is excellent at "these are the bytes the author shipped, unaltered." It says nothing about whether those bytes are safe to run or true to read. Conflating the two is how you end up trusting a signed thing that does exactly what it should not.

This is not hypothetical

Both threat models already have a paper trail.

On the executable side: in March 2025 the widely used tj-actions/changed-files GitHub Action was compromised and rewritten to dump CI secrets into public build logs. The attacker repointed every version tag at the malicious commit, so even consumers pinned to a tag were hit, and more than 23,000 repositories were exposed before it was caught.¹ That is the pull_request_target class one layer up: a dependency you trusted, changed under you.

On the knowledge side, the canonical attack is tool poisoning, disclosed by Invariant Labs in April 2025. A malicious MCP server hides instructions inside the tool description the model reads, while the client UI shows you a harmless summary. The poisoned tool does not even have to be called; loading it into context is enough, and the proof of concept against Cursor walked private data into an attacker-controlled pull request.² It is now a formal benchmark rather than a one-off,³ and it is common in the wild: one scan of 1,000 public MCP servers reported that roughly a third carried a critical vulnerability.⁴

Both sides share one root cause, the pattern Simon Willison named the lethal trifecta: access to private data, exposure to untrusted content, and a way to send data out. An agent that installs your skills has all three by default.⁵ A signature would have caught none of this, because none of it is an integrity failure. These are capability and provenance problems, which is exactly where a card has to do its work.

A badge is the wrong shape

trust-card is built on one idea: a card is graded evidence, not a verdict. It never stamps "verified." It attaches every layer of proof the producer can supply, and the consumer computes a trust gradient against their own policy.

That mirrors the asymmetry the Open Knowledge Format already uses for data: producers are precise, consumers are forgiving. A layer that cannot be checked is reported as UNVERIFIED, never as a hard failure. You decide where your bar is. A hobbyist sanity check and a production CI loading an executable skill are not the same bar, and the card does not pretend they are.

Each layer answers a different question, rests on a different anchor, and is graded on one scale, STRONG down to ABSENT:

integrity, a SHA-256 manifest digest. Are these the exact bytes, unchanged? This one is pure math and always answerable.
authorship, a signature over that digest. Did the claimed author ship them?
capability, a permission manifest for code or an epistemic scope for knowledge. What is it allowed to do, and what domain does it claim authority over?
content provenance, citations and fetch dates. Where did each claim come from? This applies to knowledge bundles; executable code has no sources.
vouching, signed attestations. Who, other than the author, has looked at it?
freshness, an expiry. Is it still current?

The card binds three provenances that are normally scattered: content (where claims came from), artifact (who shipped these bytes), and capability (what it will do). It scores the threat model honestly. A pure-knowledge bundle that asserts over a regulated domain is tagged epistemic-L2, because silent corruption there is high-impact even though it executes nothing. A skill whose shell and network access were guessed from its code, not declared, is executable-L2-unverified until a real manifest exists.

The card you can read

Evidence is only useful if a human can take it in. So the card renders as an actual card, laid out like the kind you would collect and trade, one color per domain.

Every part is driven by the card's evidence. The frame color is the skill's domain. The pip top-right is the risk tier. The art is a digest-seeded identicon, so the picture itself is provenance. The six bars are the trust layers, each filled by its grade. The diamond and the score box read the verification ceremony, and the foot carries the signer and a short digest, like an artist and a collector number.

Nothing on the card is decoration for its own sake. The bars fill by grade, so more fill means more trust and you need no color legend. The score is shown over the reachable maximum for that artifact, not a flat number, because a skill cannot reach every grade. Capability tops out at MEDIUM for code (a declared manifest is a claim; only a sandbox enforces it), freshness never reaches STRONG, and content provenance does not apply to code at all. So a fully built and signed skill sits at 10/13, and that is the honest ceiling, not a low mark.

Rarity tracks the ceremony rather than quality: generated, then declared, then signed, then independently attested. The last three points are vouching, the one layer you structurally cannot grant yourself, because vouching is what other people do.

The trust-card skill's own card: a dark Security frame, a digest-seeded identicon, six trust bars with integrity and authorship filled, and a 10 of 13 score box marked rare. — trust-card's own card. Authorship is filled because it carries a real keyless Sigstore signature, re-checked with cosign; capability and freshness sit at their honest ceilings, so 10 of 13 (rare) is the most a signed but not yet independently attested skill reaches.

The whole set also builds to a single cards.json feed, so you can render a gallery anywhere. One command rebuilds the feed and every card.

bash
pnpm cards   # writes cards.json plus a CARD.svg per skill

Reading the gradient

Run verify against the live directory and you get the gradient, not a yes or no. The digest is recomputed from disk, never trusted from the file, so this is also what catches tampering.

text
Trust gradient for: trust-card  [executable-L1]
----------------------------------------------------------------
  integrity            [###] STRONG      digest matches live bundle
  authorship           [###] STRONG      sigstore + rekor, cosign verify-blob OK
  capability           [##.] MEDIUM      permission manifest declared (enforce in sandbox)
  content_provenance   [...] UNVERIFIED  n/a for executable skills
  vouching             [...] ABSENT      0 attestations bound to this digest
  freshness            [##.] MEDIUM      not expired

You read it as a gradient and set your own bar with a policy. The card is accepted only if every layer you name meets its minimum.

bash
python scripts/card.py verify ./my-skill/CARD.md --bundle ./my-skill \
  --policy integrity:STRONG,authorship:MEDIUM,capability:MEDIUM

The capability layer is worth one honest word. For a skill, it reads a permission manifest you ship next to the code. The manifest is plain and declares what the artifact does when it runs, so a sandbox or a reviewer has something concrete to enforce against.

yaml
# permissions.yaml, for a pure-knowledge skill
model: knowledge
executes: false
network: none
shell: false
filesystem_writes: false

Use it

trust-card installs into any agent that speaks the skills.sh format.

bash
npx skills@latest add saschb2b/skills --skill trust-card

Once it is loaded, you mostly talk to it. The skill auto-invokes on the questions you would actually ask, and turns them into the right command.

text
"Make a trust card for this skill."
"Is this skill safe to load?"
"Sign and attest this bundle before I publish it."
"Verify this bundle has not been tampered with."

Under that, it is a small standard-library script with five verbs. The default loop is generate, then verify against the live bundle, so you see the gradient immediately.

bash
# build a card for a skill or an OKF bundle, then read the gradient
python scripts/card.py generate ./my-skill --identity did:web:example.com --expires 2027-01-01
python scripts/card.py verify   ./my-skill/CARD.md --bundle ./my-skill

# attach an independent vouch (the part you cannot grant yourself)
python scripts/card.py attest ./my-skill/CARD.md \
  --kind scan --by did:web:scanner.example --result pass

Signing without a key to lose

Authorship is where most signing setups go wrong, because they leave you holding a long-lived private key that anyone can steal and forge with. trust-card prefers keyless Sigstore signing, which sidesteps that entirely.

bash
python scripts/card.py sign ./my-skill/CARD.md

If cosign is on your path, this opens a short login, binds an ephemeral certificate to your real identity, signs the digest, and logs the entry to the public Rekor transparency log. No key outlives the signature. Verify does not trust the recorded result either; it shells back out to cosign verify-blob to re-check the signature, so STRONG is earned, not asserted. With no signing tool present, the card falls back to a local key and says so plainly. It never inflates the grade to look better than the evidence.

What it refuses to claim

The honest limits are the point, so they go in writing on the card and in the docs.

A trust card gives you verifiable origin, integrity, declared capability, and content lineage, plus a place to hang independent audit. It does not prove the artifact is safe or that it does only what it claims. That needs runtime sandboxing and real human or automated review, which the capability and vouching layers point at but cannot replace. The whole tool is designed to refuse the one move that makes security theater: collapsing "we checked the signature" into "this is safe."

There's a skill for this

trust-card is part of my open skill set. Install it with npx skills@latest add saschb2b/skills --skill trust-card, or read the full trust-card skill and point it at the next bundle you are about to load.

Sources

S

Written by