2026 Agent Skill
Complete Guide
Cursor SKILL.md & Mac Sandbox
You have pasted the same release checklist into Cursor twelve times this quarter. Agent Skills fix that by packaging repeatable know-how into portable folders the agent loads on demand. This guide explains what Skills are in 2026, how they differ from Rules, how the agentskills.io open standard defines SKILL.md, how three-level progressive loading keeps context lean, and how to validate script-heavy skills on a rented Mac without risking your daily driver.
Table of Contents
01 · What Agent Skills are (and are not)
An Agent Skill is a folder of procedural knowledge your coding agent can discover, activate, and follow. At minimum it contains a SKILL.md file: YAML frontmatter plus Markdown instructions. Optional subfolders hold executable scripts, reference documents, and static assets. When you ask Cursor to “deploy staging like last Tuesday,” a well-written deploy skill supplies the eleven-step checklist, the exact gh and kubectl flags your team standardized, and—if you bundled them—the validation scripts that prove the cluster actually rolled forward.
Skills are not a replacement for the model’s general reasoning. They are specialized playbooks you want loaded only when relevant. That distinction matters because every token in context competes with your open files, terminal output, and conversation history. In 2026, Cursor 2.4 and later expose Skills alongside Rules, MCP servers, and built-in tools. Teams that confuse these layers often dump entire runbooks into always-on Rules, then wonder why the agent feels sluggish or ignores half the instructions.
Skills also differ from product-specific “skill marketplaces” inside individual agents. OpenClaw, Hermes Agent, and similar stacks have their own skill registries tuned to gateways and channels. The format described here aligns with the cross-vendor Agent Skills specification published at agentskills.io, which Cursor adopted so skills you write today can travel between editors, CLIs, and enterprise agent frameworks without rewriting from scratch.
Think of three user stories. A solo developer codifies “how we write conventional commits” once and never re-explains it. A platform team ships a .cursor/skills/ directory in the monorepo so every contractor inherits the same security review steps. A power user maintains personal skills under ~/.cursor/skills/ for hobbies—photo batch processing, homelab Terraform—that should not leak into client repositories. All three benefit from the same packaging model and the same progressive loading behavior.
02 · Agent Skills vs Cursor Rules
Rules (files under .cursor/rules/ or legacy .cursorrules) are persistent guardrails: coding style, repository boundaries, “never commit secrets,” language preferences. They apply broadly across sessions unless you scope them with globs or manual rule picks. Rules are ideal for constraints that should influence every edit in a project.
Skills are task-shaped. They activate when the agent judges—usually from the skill’s description field—that your request matches a packaged workflow. A rule might say “use TypeScript strict mode.” A skill might say “run our four-step API migration checklist when the user mentions deprecating v1 endpoints,” including scripts that diff OpenAPI specs.
| Dimension | Cursor Rules | Agent Skills |
|---|---|---|
| Primary purpose | Always-on policy and style | On-demand procedures |
| Typical size | Short paragraphs, many files | SKILL.md + optional deep refs |
| Loading model | Injected when rule scope matches | Metadata always; body on trigger |
| Executable scripts | Unusual; not the focus | First-class in scripts/ |
| Portability | Cursor-centric paths | agentskills.io interoperable |
Practical selection heuristic: if removing the text would make the agent unsafe or inconsistent on routine edits, it belongs in a Rule. If removing the text would only make the agent less efficient on occasional complex workflows, it belongs in a Skill. Many mature setups use both: Rules enforce “no force push to main,” while a Skill walks through your incident response template when you type “SEV2 database failover.”
Another mistake is duplicating the same paragraph in five Rules and three Skills. Single-source the long procedure in one Skill, then add a one-line Rule that says “for production deploys, prefer the deploy-staging skill.” That keeps token burn predictable and makes updates atomic—you edit one SKILL.md instead of hunting scattered rule files.
03 · The agentskills.io open standard
In early 2026 the Agent Skills format was contributed to the ecosystem as an open specification at agentskills.io. The goal is framework-agnostic packaging: any runtime that can list directories, parse YAML frontmatter, and read Markdown can implement discovery and progressive loading. Microsoft’s Agent Framework, Claude Code–compatible tooling, and Cursor all document alignment with the same required fields and folder conventions.
The spec mandates a skill directory name in kebab-case that matches the name frontmatter field. Required metadata is minimal on purpose—only name and description—so lightweight skills stay easy to author. Optional fields include license, compatibility (environment requirements up to 500 characters), metadata (arbitrary key-value pairs), and the experimental allowed-tools list for pre-approved tool invocations.
Recommended optional directories:
scripts/— executable helpers (Python, Bash, Node) the agent runs instead of regenerating fragile one-off codereferences/— long-form docs loaded only when the task needs depth (API schemas, policy PDFs converted to Markdown)assets/— templates, sample configs, images used in outputs
Because the standard is open, you can lint skills in CI without Cursor running: validate frontmatter length, ensure name matches the folder, check scripts are executable and referenced from SKILL.md. That is how platform teams treat skills as versioned artifacts rather than chat folklore.
04 · SKILL.md format and directory layout
Every skill centers on SKILL.md with YAML frontmatter closed by ---, followed by free-form Markdown instructions. Cursor’s built-in /create-skill command scaffolds this structure and interviews you for purpose, triggers, and storage location (personal vs project).
---
name: deploy-staging
description: Deploy application to staging via gh and kubectl. Use when user mentions staging deploy, hotfix to staging, or release candidate validation.
compatibility: Requires kubectl context "staging", gh CLI authenticated, macOS or Linux shell.
---
# Deploy to staging
## Preconditions
- Confirm CI green on the target SHA.
## Steps
1. Run `scripts/preflight.sh --env staging`
2. ...
Field constraints from the specification worth memorizing:
name— max 64 characters; lowercase letters, digits, hyphens; must not start or end with a hyphendescription— max 1024 characters; must state what the skill does and when to use it; include trigger keywords agents can match- Body — no fixed schema; use checklists, templates, and conditional branches; keep the main file under ~500 lines and push depth to
references/
Storage locations in Cursor:
- Personal:
~/.cursor/skills/skill-name/SKILL.md— available in all repositories on your machine - Project:
.cursor/skills/skill-name/SKILL.md— committed with the repo for team sharing
Do not write team skills into ~/.cursor/skills-cursor/; that path is reserved for Cursor’s internal built-in skills (including the authoring helper behind /create-skill).
Descriptions should be written in third person (“Processes Excel files…”) because they are injected into system-facing selection context. Pair concrete nouns with verbs users actually say: “App Store screenshot export,” “Terraform plan review,” “Dependabot PR labeling.” Vague descriptions like “helps with DevOps” almost never trigger reliably.
05 · Three-level progressive loading
The Agent Skills specification defines progressive disclosure in three levels. Understanding this model is the key to fast agents that still handle deep runbooks.
- Level 1 — Metadata (~100 tokens per skill): At startup or session initialization, the runtime loads only each skill’s
nameanddescription. This is the catalog the agent scans to decide relevance without paying for full instructions upfront. - Level 2 — Instructions (< 5,000 tokens recommended): When a task matches a description, the agent reads the full
SKILL.mdbody into context. This layer holds steps, examples, edge cases, and links to deeper files. - Level 3 — Resources (as needed): Files under
scripts/,references/, andassets/load only when the workflow demands them—running a validator, opening a 40-page style guide, or copying a report template.
This mirrors how senior engineers work: you keep a mental index of playbooks, open the right binder when the incident type is clear, and only then pull annexes or run shell scripts. Agents without progressive loading either ignore large rules or blow the context window before reading your actual code.
Authoring implications are strict. Put trigger logic in the description, essentials in SKILL.md, and encyclopedic content in linked reference files one hop away—deeply nested links may be read partially. Prefer bundled scripts over pasting fifty-line Bash blocks inline; execution costs fewer tokens than generation and reduces hallucinated flags.
Cursor’s default for many generated skills sets disable-model-invocation: true in frontmatter so the skill loads when you name it explicitly (or when the agent’s skill picker chooses it), not on every ambient message. Omit that flag only when you genuinely want ambient auto-invocation—for example a skill that must always apply to PDF uploads in a document-heavy repo.
06 · Authoring with /create-skill
The fastest on-ramp in Cursor is the /create-skill slash command, backed by an internal authoring skill that walks through requirements: purpose, scope, triggers, output format, and whether scripts are needed. Treat it as a structured interview rather than a magic generator—you still own the resulting SKILL.md.
A productive session flow:
- Describe one workflow you repeated at least three times last month (not a vague domain).
- Choose personal vs project storage based on whether teammates need the same playbook.
- Paste any verbatim team wording that must not be paraphrased (release notes templates, legal disclaimers).
- Ask for a validation script if the workflow touches production systems.
- Run a real task immediately after scaffolding; edit description keywords based on what failed to trigger.
After generation, refine using the checklist from the specification: name matches folder, description includes WHEN triggers, body under 500 lines, references one level deep, scripts documented with expected exit codes. Version the skill in git like application code; tag releases when you change script interfaces.
If you already maintain Hermes-style skills under ~/.hermes/skills/ or OpenClaw skill packs, map fields rather than duplicating: the intellectual content transfers, but paths and discovery differ. For Hermes’s closed learning loop that distills tasks into SKILL.md automatically, see our 30-day Hermes Agent skill library review—complementary, not competing, with Cursor’s Agent Skills folder layout.
07 · Best practices for production skills
Write for discovery first
Spend half your authoring time on the description. Include product names, file extensions, error strings, and command phrases users paste from Slack. Test by starting a fresh chat with a minimal prompt and verifying the agent selects the skill without you naming it—when auto-invocation is intended.
Match freedom to fragility
High-freedom skills use prose guidelines (code review culture). Medium-freedom skills supply templates with acceptable variation (incident timelines). Low-freedom skills ship pinned scripts for database migrations or certificate rotations. Choosing wrong freedom levels produces either brittle agent improvisation or bloated context.
Use feedback loops
Quality-critical skills should instruct the agent to run validators and iterate until pass—pattern: edit → python scripts/validate.py → fix → re-run. Document expected failure modes in SKILL.md so the agent does not treat a known linter warning as success.
Keep skills orthogonal
One skill per coherent outcome. Split “iOS archive upload” from “Android Play upload” even if steps look similar; combined skills confuse trigger matching and inflate Level 2 loads.
Security hygiene
Never embed API keys in SKILL.md. Reference environment variables and secret managers. For third-party skill hubs, prefer signed git tags or internal mirrors; our OpenClaw-focused article on third-party skills security and isolation covers parallel risks when skills can shell out freely.
Measure token economics
After a sprint using a new skill, compare session token usage against your old “mega-prompt” baseline. Teams routinely see thirty to fifty percent reductions on repeat operations when Level 3 scripts replace regenerated code—similar to Hermes’s progressive disclosure savings reported in production agent workflows.
08 · FAQ: Agent Skills vs MCP
Are Skills just another MCP server?
No. MCP (Model Context Protocol) connects the agent to live external systems: issue trackers, databases, browsers, proprietary APIs. MCP tools are dynamic queries and actions with server-side state. Skills are static or semi-static instructions plus optional local scripts—packaged expertise, not a network service.
When should I use MCP instead of a Skill?
Use MCP when the agent must fetch fresh data each turn (ticket status, log tail, Figma frame metadata). Use a Skill when the knowledge is procedural and changes on human release cadence (how to format your quarterly board deck, how to run notarytool stapling). They compose: a Skill can say “call the Jira MCP tool with project KEY-123, then apply the escalation template below.”
Can a Skill replace MCP for Git operations?
Usually not entirely. Local git CLI access is already available to the agent; a Skill tells it your branching and squash rules. MCP helps when git lives behind a custom forge API or you forbid shell git entirely. For MCP wiring in agent stacks, see OpenClaw MCP integration and approval security.
Does agentskills.io conflict with MCP tool lists?
The experimental allowed-tools frontmatter field pre-approves tool names for a skill invocation; MCP remains the transport. Think of Skills as curriculum and MCP as lab equipment—the curriculum says which experiments are allowed this week.
09 · Five-step Mac rental sandbox for script skills
Script-heavy skills are where developers get burned. A skill that runs scripts/deploy.sh can chmod directories, call production webhooks, or install Homebrew packages on the machine where Cursor runs. Testing that on your daily MacBook risks polluting Keychain entries, Docker contexts, and SSH config you rely on for client work. The rational 2026 pattern is an isolated macOS sandbox—a rented Mac mini M4 or Mac Studio you spin up for one to three days, validate the skill end-to-end, then wipe.
Why Mac specifically? Many skills assume BSD userland, Apple codesigning paths, or Xcode CLT behavior that WSL cannot faithfully emulate. Cursor on Windows with remote SSH to a Mac node matches how teams already run iOS and notarization workflows; the same node works for skill validation.
Step 1 — Rent a clean macOS node
Book a dedicated Apple Silicon machine with a fresh user account. Confirm SSH (and VNC if you want GUI confirmation) from the Mac mini M4 pricing guide or broader bare-metal macOS pricing page. Avoid sharing the node with unrelated experiments; skill tests should be the only activity that day.
Step 2 — Sync skills only
Push your skill folder to a private git branch or rsync .cursor/skills/your-skill/ and ~/.cursor/skills/your-skill/—not your entire home directory. Exclude .env, API keys, and unrelated dotfiles. On the remote Mac, clone into ~/skill-sandbox/ and symlink into Cursor’s expected paths so the agent discovers the same layout as locally.
Step 3 — Install declared dependencies
Honor the compatibility frontmatter: install pinned Node via nvm, Python in a venv, or brew bundle from a manifest you colocate in the skill repo. Run chmod +x scripts/* and execute each script’s --help manually once before involving the agent.
Step 4 — Dry-run scripts, then agent execution
Run bundled scripts yourself with --dry-run or staging endpoints. Capture stdout, stderr, and exit codes in a log file. Only then open Cursor on the rented Mac (or remote into it) and ask the agent to follow the skill on a non-production target. Compare agent-chosen commands against your manual baseline; drift means SKILL.md needs tighter low-freedom wording.
Step 5 — Promote or discard
If validation passes, merge the skill into your main branch or personal skills path on your primary machine. If scripts misbehave, delete the remote user data (MacDate’s return checklist helps ensure zero residue—see five-step return security checklist) and iterate without ever running risky commands on your laptop.
This five-step loop costs far less than a mistaken production deploy or a corrupted local Keychain. For SSH/VNC access patterns and billing FAQs, the daily Mac rental FAQ answers the operational questions engineers ask before their first sandbox day.
Why renting beats buying for skill R&D
Self-purchasing a Mac mini M4 for occasional skill experiments ties up capital and still leaves you merging test junk into your primary environment. Daily rental aligns spend with validation sprints: you pay for the twenty-four to seventy-two hours when scripts are dangerous, not for idle hardware between releases. Teams running multiple skill authors in parallel can rent separate nodes so conflicting brew install recipes never collide—something a single shared office Mac cannot offer.
Once skills stabilize, you might still rent for quarterly regression tests after macOS upgrades or Xcode releases, because compatibility frontmatter is only as honest as your last test run. That is cheaper than discovering breakage during a Friday production incident.
10 · General FAQ
How many skills should I enable at once?
Level 1 metadata is cheap, but dozens of vague descriptions still add noise. Aim for ten to twenty well-scoped skills per repo plus personal skills for cross-cutting habits. Archive skills you have not triggered in ninety days.
Can I share skills across Cursor and Claude Code?
Folders that follow agentskills.io layout are increasingly portable. Verify path conventions and tool availability on each runtime; rewrite only what is product-specific.
What if the agent never picks my skill?
Rewrite the description with user-verbatim triggers, split overloaded skills, and test in a new session. If you need guaranteed loading, invoke the skill by name or adjust disable-model-invocation per Cursor docs.
Should OpenClaw skill packs migrate to SKILL.md?
Conceptually yes for procedural content; gateway plugins and channel wiring remain OpenClaw-specific. Start with our OpenClaw Skills and ClawHub day-rental trial if you run both stacks.
Agent Skills turn recurring prompts into durable, discoverable assets. Rules keep your guardrails; MCP keeps your live data; Skills keep your hard-won procedures. Author with agentskills.io in mind, respect three-level loading, use /create-skill to bootstrap, and rent an isolated Mac when scripts can hurt. That is the 2026 workflow mature Cursor teams standardize on.