What Are Skills? How Modular AI Knowledge Packages Change What Agents Can Do

Callam Ingram

•February 20, 2026•5 min read

AISkillsAgentsEnterpriseResearch

Foundation models are powerful — but power alone doesn't mean they know how your team works. Skills close that gap, turning a general-purpose agent into a domain expert without any retraining.

When people talk about making AI more useful, the conversation usually lands in one of two places: fine-tuning a model on your data, or writing a better system prompt. Both approaches have real limitations. Fine-tuning is expensive, slow to update, and brittle when workflows change. A long system prompt is a blunt instrument — it occupies precious context and has no way to stay focused on what's actually relevant to the task at hand.

Skills are a third path. Rather than changing the model or bloating its instructions, a Skill is a self-contained, portable package of procedural knowledge that the agent reads at inference time — only when it needs it.

The idea in plain language

Think of Skills the same way you think of apps on an operating system. The hardware (foundation model) stays the same. The OS (agent harness) stays the same. But you install the right app for the job — and uninstall it when you no longer need it. Each layer is independent, composable, and upgradeable.

A Skill encodes procedural knowledge: the step-by-step workflows a practitioner follows, the conventions of a specialised domain, the proprietary SOP your team uses every day. This is precisely the kind of knowledge that doesn't survive well in general training data, but is exactly what separates a useful agent from one that gives technically-correct but practically-useless answers.

Every Skill is defined by a SKILL.md file — a machine-readable configuration block paired with a plain-language instruction manual for the agent. When a user's request matches what a Skill is designed to handle, the agent loads it. The right knowledge, right when it's needed.

What the research shows

In February 2026, a large-scale benchmark study called SkillsBench published the most rigorous evidence yet on how Skills affect real agent performance. Researchers tested 7,308 agent trajectories across 84 tasks spanning 11 domains, with seven agent-model configurations evaluated under three conditions: no Skills, curated Skills, and self-generated Skills.

+16.2pp

Average pass-rate lift

from curated Skills across all domains and model configurations

+51.9pp

Healthcare domain

largest single-domain gain — specialist domains benefit most

0pp

Self-generated Skills

no average benefit — models can't reliably author what they benefit from consuming

Resolution Rate by Agent-Model Configuration

Across 84 tasks — three conditions per model

Source: SkillsBench — Li et al., Feb 2026 · arXiv:2602.12670

Two findings deserve particular attention. First, curated Skills — authored by domain experts, not generated by the model itself — consistently improved performance across every configuration tested. The gains were especially large in specialised domains: healthcare (+51.9pp), manufacturing, and finance all showed dramatic improvements, precisely because these fields depend on procedural knowledge that isn't well captured in general training.

Second, and counter-intuitively, models cannot reliably author the procedural knowledge they benefit from consuming. Self-generated Skills provided negligible or negative benefit on average. The quality of a Skill — its precision, its domain-specificity, its alignment with how practitioners actually work — is what counts. Presence alone is not enough.

The study also found that focused Skills covering two or three coherent modules outperformed sprawling, comprehensive documentation. Smaller models augmented with well-crafted Skills matched the performance of larger models without them — a significant result for organisations balancing capability against cost.

How Skills work in Miito

In Miito, Skills are first-class citizens. Every Skill ships with a SKILL.md file that carries both its configuration (permissions, allowed tools, surface availability, required packages) and its instruction body. Skills can also include bundled files — Python scripts, reference documents, or templates — that the agent can execute or consult inside a sandboxed environment.

Access is controlled at five levels: organisation, user, surface (web, Slack, API), environment (sandbox availability), and trust level. This layered gating means a Skill is never visible — or billable — unless the right conditions are met. Each Skill also declares a declarative allowlist of every tool and external endpoint it may use, so nothing leaves your organisation that wasn't explicitly declared.

Skills in Miito come from three sources: Built-in Skills shipped with the platform, Custom Skills authored by your organisation's admins (proprietary SOPs, internal workflows, custom integrations), and Skills installed from ClawHub, a growing community registry. All three share the same format, the same access controls, and the same runtime behaviour.

The SkillsBench findings validate what the Cypress AI consulting team has observed directly: the gap between what an agent can do out of the box and what it can do with expertly authored, domain-specific Skills is substantial — and it grows with domain complexity. The Skills layer is where a general-purpose agent becomes a precision tool for your workflows.

Share this post: