Skills Are Package Management for Your AI

There are more than 1.6 million you can install. You need about twenty. Software already solved that problem once.

H. Floyd

Jun 08, 2026

Article voiceover

0:00

-21:20

I want updates!

There are more than 1.6 million Claude skills you can install today.¹ You need about twenty.

The distance between those two numbers is the whole problem, and it is an old one. Software lived through this exact moment once before, when we decided code should travel in small reusable units. Sharing turned out to be the easy part. The hard part arrived a few years later, and it was trust: which of the millions of packages is current, safe, and does what its README promises. We answered that question by building a whole discipline around it. Versioning. Lockfiles. Audits. Deprecation notices. A bill of materials. Most people wiring skills into their AI right now are skipping every step of it and treating the result as progress.

A skill is the smallest durable unit of agent behaviour. In plain terms it is a folder with a single instruction file inside, written so the model loads it only when the task in front of it matches. Anthropic published the format as an open standard, and one skill file now runs across more than twenty different agents, from Claude Code to Codex to Cursor.2 That portability is the tell. A thing built to be copied everywhere is a thing whose copies will multiply faster than anyone can check them.

A prompt is stateless and dies with the conversation. A skill is a versioned file an agent picks up when the work calls for it, and it behaves the same way next week.

That difference is bigger than it looks. The moment a procedure persists, gets shared, and runs without you watching, you have stopped writing prompts and started managing dependencies.

What you are actually installing

When you copy a skill off a marketplace, you are adding a behavioural dependency, and that is a stranger and more dangerous object than the code dependencies engineers already lose sleep over.

A bad code library throws an error you can see in a stack trace. A bad skill expresses itself through the agent’s decisions. It nudges a tone, skips a verification step, assumes a permission, reaches for the wrong tool, and it does all of this inside work you delegated precisely because you were not going to check every line. The failure does not announce itself. It compounds, silently, across every session that loads the file. Security researchers have started treating a copied skill as exactly what it is: a dependency in your agent’s behaviour, with its own supply chain to secure.3

Untrusted packages compromise your software. Untrusted skills compromise your behaviour.

Software’s answer was to wrap every copy in accountability. A package carries a version, a source, a license, a list of what it is allowed to touch, and a path to rip it out when it turns. The agent-skills world has the copying down and almost none of the accountability. The marketplaces measure themselves in millions of listings and install counts, which is the metric of a field that still thinks the file is the achievement.

The library is the moat, not the model

This matters past hygiene. The model underneath your agent is rented. It resets every release cycle, and the next version reaches your competitor on the same Tuesday it reaches you. Whatever advantage lives in the weights is an advantage everyone gets at once. So it cannot be where your durable edge sits.

The skill library can be. It is the layer you own, the accumulated record of how your work gets done, and the test of whether it is a moat is simple: you can swap the model underneath it without losing what you built. The decisions survive the upgrade. That is architecture outliving content stated in one sentence, and it is the same structural bet behind why the same model behaves like a different product depending on the code wrapped around it.

A skill file costs nothing to copy, which is the precise reason the file was never the product.

If anyone can copy it for free, the value sits in the curation: knowing which twenty of 1.6 million earn a place, verifying each one runs in your environment rather than a demo, and writing down the failure mode you only learned by hitting it. That is also where the money goes. The marketplaces selling skill files are mostly dying, while the work of choosing and vetting and bundling is becoming the thing people pay for. Value migrated off the artefact and onto the judgement about the artefact, which is the same move the leverage hierarchy of agent engineering traces one layer down. A 2026 benchmark of eighty-six tasks put numbers on it: a curated skill set raised the average success rate by about sixteen points, while the skills models wrote for themselves produced no gain at all.4

What the discipline looks like when you run it

My agent stack carried 179 skills. A handful I wrote by hand, each from a failure I had already hit and read closely enough to encode. The rest I pulled in from libraries, the way you add packages to a project.

One of the hand-written ones does grounded research, and it exists only because the obvious tool for the job drove a browser under my own login and broke a platform’s terms of service to do it. So the capability got rebuilt on a sanctioned interface instead. The skill carries that constraint in writing, because a rule that lives only in my head is a rule the agent will eventually cross. Another governs what is allowed to graduate from a holding area into the permanent vault, and the first thing it does, before any other check, is look for a duplicate, because a fabricated gap is treated as a failure rather than a clever new contribution.

Then I wrote the audit this piece describes and ran it across the whole folder. The result was humbling. The average skill scored two out of six. 82% carried a version, but only 4% recorded when they were last verified, under a quarter declared what they were allowed to touch, and 3% had any way to retire them. The grounded-research skill I just held up as a model scored zero, because I had written it as careful instructions and never given it a version, a source line, or a switch to turn it off. The discipline I am describing here, I was barely doing myself.

The file is the cheap part. The discipline wrapped around the file is the part that took months and cannot be copied.

There is an order to it that matters more than the contents. The guardrails go in before the skills that act. The review gate, the duplicate check, the permission boundary, the terms-of-service rule: those get installed first, so that by the time a capable agent is doing real work, the structure it would happily skip has already been made mandatory. A model that is good enough to be useful is good enough to route around safety it sees as optional. The install order is how you make it not optional.

The one-week test

Open the folder where your AI keeps its skills. For each one, answer six questions.

What version is this?
Where did it come from?
What is it allowed to touch?
When did I last confirm it works in my setup?
What else now depends on it?
How would I switch it off in a hurry?

Most skills will fail that audit, and the ones that fail are usually the ones quietly running your agent. Write the six answers down for each. The folder that results is the first version of the thing that compounds while the models underneath it keep getting replaced.

Which skill is steering your agent right now that you could not, this minute, tell me the source of?

Want to run this on your own folder? The Skill Bill of Materials worksheet is the audit above, turned into a one-page sheet you fill in. Take it to your skills directory this week and see how much of it scores zero.

The Skill Bill Of Materials

179KB ∙ PDF file

Download

The Durability Curve is frequent essays and analysis on what stays valuable while the tools underneath keep changing.

Subscribe if that is your kind of question.

SkillsMP, the largest public agent-skills marketplace, listed 1,640,440 skills as of 8 June 2026. https://skillsmp.com/

Anthropic, “Equipping agents for the real world with Agent Skills,” and the public reference repository at github.com/anthropics/skills. The SKILL.md format is documented as an open standard adopted across Claude Code, Codex, Gemini CLI, Cursor and others. https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills

The supply-chain framing is now formal. See “Formal Analysis and Supply Chain Security for Agentic AI Skills” (arXiv:2603.00195), which proposes an Agent Skill Bill of Materials recording each skill’s identity, version, content hash, declared permissions, and dependency edges. https://arxiv.org/abs/2603.00195

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks (arXiv:2602.12670). Across 86 tasks and 7,308 trajectories, a curated skill set raised average pass rate by 16.2 points, while self-generated skills gave no average benefit. https://arxiv.org/abs/2602.12670

The Durability Curve

Discussion about this post

Ready for more?