Why AI Agents Can't Just "Trust" Things

A developer sees a suspicious npm package and pauses. Researches. Asks a colleague. An AI agent executing a build pipeline at 3am has no such luxury. It either has a programmatic way to verify trust, or it's flying blind.

The infrastructure for verifiable software exists. SLSA, Sigstore, in-toto, C2PA — serious standards, deployed in production, backed by Google, Linux Foundation, and major registries. npm has SLSA provenance. Container images carry attestations. The cryptographic primitives are real.

But they were all built for human workflows. A developer runs cosign verify. A CI system checks provenance. A security team reviews attestations. The verification happens, but it requires human orchestration.

AI agents need something different: trust as a tool they can invoke autonomously.

The Gap

When Claude is about to install a package, it can't pause and "feel suspicious." It can't ask a colleague. It can't Google the maintainer's reputation. It needs a tool call that returns a deterministic answer: trusted or not, with cryptographic evidence.

This is the gap: the standards exist, but agents can't access them.

SLSA tells you if a build is reproducible — but there's no verify_slsa() tool an agent can call. Sigstore provides keyless signing — but there's no MCP interface for querying the transparency log. C2PA embeds content credentials in media files — but agents can't extract and verify them mid-workflow.

The verification capabilities exist. The agent-accessible interfaces don't.

What We Built

We built code-signing-mcp — an MCP server that exposes trust verification as tools AI agents can invoke:

sign_binary — Sign any artifact with your choice of provider
verify_signature — Verify signatures and attestations
verify_trust_chain — Check if a signer is in your trust graph
supply_chain_attestation — Generate SLSA/in-toto attestations

The agent doesn't need to understand SLSA levels or Sigstore's Rekor log or C2PA manifests. It calls verify_trust_chain() and gets back: trusted or not, why, and the cryptographic evidence.

Agent: verify_trust_chain(
    artifact="npm:some-package@1.2.3",
    trust_root="did:web:mycompany.com"
)

Result: {
    "trusted": true,
    "signer": "did:web:verified-vendor.com",
    "attestations": {
        "slsa_level": 3,
        "provenance": "https://rekor.sigstore.dev/..."
    }
}

The agent gets a deterministic answer. Proceed or don't. No vibes. No heuristics.

Multi-Provider Architecture

Different organizations have different signing infrastructure:

Provider	Standards	Use Case
Sigstore	SLSA, Rekor	Open source, keyless signing
C2PA	Content Credentials	Media, documents, any artifact
Enterprise PKI	X.509, HSM	Corporate signing infrastructure
Local	Ed25519/RSA	Development, air-gapped

The MCP server abstracts the provider. The agent calls verify_signature() — the server routes to the appropriate backend and normalizes the response. When capabilities matter ("I need C2PA manifests for this media file"), the agent can query providers and select appropriately.

The Trust Graph

Verification alone isn't enough. The agent also needs to know: who do I trust?

This is where trust.json comes in — a machine-readable declaration of an organization's trust model:

{
  "trust_anchors": [
    "did:web:mycompany.com",
    "did:web:trusted-vendor.com"
  ],
  "verification_policy": {
    "require_signed_attestations": true,
    "min_slsa_level": 2
  }
}

When an agent encounters an artifact, it doesn't just verify the signature is valid — it checks whether the signer is in the trust graph. A perfectly valid signature from an unknown entity still fails verification.

This makes trust decisions deterministic. Given a policy and cryptographic evidence, the answer is yes or no. The agent doesn't have to "decide" — it evaluates.

Beyond Code

Here's where it gets interesting: agents don't just consume code. They consume documents, images, API responses, training data, tool definitions. In the age of synthetic everything, any digital artifact can be faked.

C2PA was originally designed for combating deepfakes — embedding cryptographic provenance in media files. But the same approach works for any artifact. A signed PDF. A verified API response. An attested MCP server manifest.

The trust verification primitives we built aren't limited to npm packages. They work for anything with a signature.

Trust as a Tool, Not a Gate

Traditional security treats trust as a gate. A human reviews, approves, and the artifact passes through. This works when humans are in the loop.

Agents need trust as a tool. Something they invoke mid-workflow, autonomously, as part of their reasoning. The verification doesn't wait for approval — it happens inline, at machine speed.

This is the shift: from security review to security capability. The agent doesn't ask permission. It has the tools to verify trust itself.

Getting Started

pip install code-signing-mcp

Add to Claude Desktop:

{
  "mcpServers": {
    "code-signing": {
      "command": "code-signing-mcp",
      "args": ["--transport", "stdio"]
    }
  }
}

The agent now has access to sign_binary, verify_signature, verify_trust_chain, and supply_chain_attestation.

The Bigger Picture

SLSA, Sigstore, in-toto, C2PA — these aren't competing standards. They're complementary layers of a complete digital integrity stack. The industry has been building toward verifiable artifacts for years.

What was missing: agent-accessible interfaces. MCP provides the bridge. Now agents can invoke the same verification capabilities that humans have been using — but autonomously, at scale, as part of their workflow.

Trust decisions shouldn't require human judgment every time. With clear policies and cryptographic evidence, they can be deterministic. That's what we're building toward.

Resources: