2026 Hermes Agent Skills
Advanced Guide
From SKILL.md to GEPA Evolution
You installed Hermes Agent, ran hermes doctor, and still reach for the same copy-pasted prompts every sprint. That gap between "agent is running" and "agent knows how we work" is exactly what the Skills system closes. This guide goes past first-boot setup—see our Hermes install walkthrough for that—and into the mechanics that separate hobby deployments from production workflows: Progressive Disclosure for token economics, Skill Bundles for multi-step pipelines, conditional activation for environment-aware routing, Tap publishing for team distribution, and GEPA + DSPy for skills that improve from execution traces rather than model fine-tuning.
Table of Contents
01 · Why Hermes Agent's Skills system deserves its own deep dive
In early 2026, Nous Research shipped Hermes Agent with a tagline that stuck: "the agent that grows with you." Within two months the project crossed 160,000 GitHub stars—one of the fastest-growing open-source agent stacks on record. The headline feature is not a bigger base model; it is a portable, versionable layer of procedural memory called Skills.
Unlike one-shot system prompts that vanish when the session ends, Hermes Skills are standardized documents the agent loads on demand, shares across sessions, and—when you wire up GEPA—refines from real execution data. If you already run the gateway on a VPS or a rented Mac mini, you have the runtime. This article explains how to make that runtime actually reflect how your team ships software, writes docs, and debugs production incidents.
We skip install basics and focus on the advanced surface: token-aware loading tiers, YAML Bundles that fire entire workflows with one slash command, metadata-driven conditional visibility, community Tap distribution, and the Genetic-Pareto evolution pipeline that treats SKILL.md as mutable text rather than frozen lore.
02 · Three pain points: installing Hermes is not the same as mastering Skills
- Runaway token spend. Teams dump entire SOPs into the system prompt and pay thousands of tokens every session. Without Progressive Disclosure, loading the full body of fifty skills can blow the context window before the user asks a question.
- Imprecise skill activation. Vague
descriptionfields cause the LLM to mount the wrong skill—or miss the right one entirely. Without conditional activation, switching between free DuckDuckGo search and paid Brave/Firecrawl APIs means hand-editing config instead of letting metadata hide irrelevant skills automatically. - Knowledge that never compounds. Personal prompts live in chat history, not git. There is no Tap for teammates to subscribe to, no validation hook, and no GEPA loop to turn failed traces into improved procedures. Skills stagnate while the model API bill climbs.
Each pain point maps to a feature section below. Fixing all three is what separates a demo gateway from an agent your org actually trusts on Friday deploys.
03 · Core concepts: Skills ≠ Prompts, Skills ≠ Memory
Hermes exposes three overlapping but distinct context channels. Conflating them is the most common architectural mistake we see in first deployments.
| Dimension | Plain Prompt | Memory | Skills |
|---|---|---|---|
| Persistence | Current chat only | Cross-session, permanent | Cross-session, permanent |
| Load timing | Always in context | Injected each session | On demand |
| Token cost | Every turn | Small, stable footprint | Zero until activated |
| Content type | Any intent text | User prefs / facts | Procedural steps |
| Shareability | Awkward | Private by default | Publishable as community Tap |
Mnemonic: Prompt = sticky note (valid this conversation). Memory = notebook on your desk (always nearby). Skill = SOP manual (pull from shelf when the task matches). For Cursor-specific parallels—Rules vs Skills vs MCP—see our 2026 Agent Skill complete guide.
04 · SKILL.md format deep dive (agentskills.io open standard)
Every Hermes Skill follows the agentskills.io specification so the same folder works in Hermes, Claude Code, and Cursor without rewrite. That portability is strategic: author once, validate on a rented Mac, deploy to whichever agent your team standardizes on next quarter.
---
name: my-skill
description: |
Use when the user needs to [...].
Handles [...] and [...].
version: 1.0.0
license: MIT
compatibility: Requires git, docker
allowed-tools: Bash(git:*) Read
metadata:
hermes:
tags: [devops, automation]
category: software-development
related_skills: [github-pr-workflow, test-driven-development]
requires_toolsets: [terminal]
fallback_for_toolsets: [web]
---
# My Skill Title
## Overview
## When to Use
## Procedure
## Common Pitfalls
## Verification ChecklistCritical fields: name is required—lowercase letters and hyphens, max 64 characters. description is required—max 1024 characters—and should open with "Use when…" because Level 0 routing sees only name + description. Put Hermes-specific routing in metadata.hermes: tags, categories, toolset requirements, and fallback rules covered in section 07.
Modular directory layout
Keep the main file lean; push reference material into subfolders the agent loads only when executing.
~/.hermes/skills/
└── my-category/
└── my-skill/
├── SKILL.md # Main file (target ≤500 lines)
├── references/ # API docs — loaded on demand
├── templates/ # Reusable output scaffolds
└── scripts/ # Executable helpers the agent can runThe 500-line guideline is not cosmetic. GEPA's safety rails reject Skills above 15 KB, and bloated main files defeat Progressive Disclosure—you pay Level 1 tokens for content that should live at Level 2.
05 · Progressive Disclosure: three-level loading
Progressive Disclosure is Hermes's answer to the "fifty skills ate my context" problem. The gateway never dumps every SKILL.md body into the prompt at session start.
| Tier | Content loaded | Trigger | Token cost |
|---|---|---|---|
| Level 0 | name + description only | Every session start | ~3K total across all skills |
| Level 1 | Full SKILL.md body | /skill-name or LLM routing | Depends on file length |
| Level 2 | references/, scripts/ | LLM decides during execution | Per file, on demand |
Authoring implication: invest disproportionate effort in the description—when to use, when not to use, product names, error strings users paste from Slack. Move API tables and long examples into references/. Teams running thirty-plus skills report that disciplined Level 0 descriptions cut mistaken activations by half compared to generic "helps with code" summaries.
Level 2 is where script-heavy skills shine: the agent reads the procedure in Level 1, then pulls only the one script or reference file needed for the current sub-step. That pattern mirrors how senior engineers work—they do not memorize every man page; they open the relevant page when executing.
06 · Skill Bundles: one command, full workflow
Skill Bundles arrived in Hermes 2026 as a first-class workflow primitive. A Bundle is a lightweight YAML file listing multiple skills that load simultaneously when the user types /bundle-name. Think of it as a curated playlist for agent context—not a new prompt layer, but a coordinated multi-skill mount.
File location: ~/.hermes/skill-bundles/<slug>.yaml
name: backend-dev
description: |
Full backend feature workflow — code review, TDD, and PR management.
skills:
- github-code-review
- test-driven-development
- github-pr-workflow
instruction: |
Always write failing tests first before implementation.
Never push directly to main.Priority rules worth memorizing:
- If a Bundle and a single Skill share the same name, the Bundle wins.
- Skills listed but not installed are skipped silently—no error spam.
- Bundles do not rewrite the system prompt, so they preserve prompt-cache efficiency on providers that support caching.
CLI shortcut to scaffold:
hermes bundles create backend-dev \
--skills github-code-review,test-driven-development,github-pr-workflow \
--instruction "Always write failing tests first"Common Bundle recipes in the wild include an AI researcher stack (arxiv + deep-research + plan + excalidraw) and an MLOps deploy pipeline (vllm + llama-cpp + github-pr-workflow + systematic-debugging). The instruction block is your team's non-negotiables—branch protection rules, test ordering, security review gates—without duplicating them inside every constituent skill.
07 · Conditional activation: environment-aware skills
Skills can auto-hide or auto-show based on which tools and toolsets exist in the current session. Configure this under metadata.hermes so the Level 0 skill list reflects reality—free search when paid APIs are absent, terminal skills when SSH is unavailable, web fallbacks when headless mode is active.
| Field | Behavior |
|---|---|
requires_toolsets | Hide skill if listed toolsets are missing |
requires_tools | Hide skill if listed tools are missing |
fallback_for_toolsets | Hide when listed toolsets are present (backup path) |
fallback_for_tools | Hide when listed tools are present (backup path) |
Canonical example: a DuckDuckGo search skill sets fallback_for_tools: [web_search]. When the user configures FIRECRAWL_KEY or BRAVE_SEARCH_KEY, the paid web_search tool activates and DuckDuckGo hides—saving tokens and avoiding duplicate search strategies. If the API key expires, the fallback skill resurfaces without a config edit. That is conditional activation doing policy work your ops team would otherwise encode in runbooks.
08 · Skills Hub and the open-source ecosystem
Hermes ships multiple install channels so you are not locked to a single registry:
hermes skills install official/research/arxiv
hermes skills install https://example.com/SKILL.md --name my-skill
hermes skills install github:openai/skills/k8s
hermes skills tap add github:my-org/my-skillsCommunity curators maintain production-grade collections. The table below highlights repos worth bookmarking—not exhaustive, but representative of what teams actually pull into CI sandboxes.
| Repository | Highlight | Stars |
|---|---|---|
| ChuckSRQ/awesome-hermes-skills | Production bundles incl. Deep Research, MLOps | 67+ |
| amanning3390/hermeshub | Community registry with prompt-injection checks | 166+ |
| kevinnft/ai-agent-skills | 191 skills, cross Hermes / Claude / Cursor | 10+ |
| NousResearch/hermes-agent | Official source of truth | 160k+ |
Validate before you trust third-party skills: skills-ref validate ./my-skill checks agentskills.io compliance. Skill assets are plain files in git—they do not bind you to Hermes forever. That portability is why many teams mirror skills into internal repos alongside application code.
09 · Publishing your own Skill Tap: team and community distribution
A Tap is a GitHub repository that acts as a subscription feed for skills. Add it once; every teammate pulls updates with hermes skills tap update. Structure:
my-skills-tap/
├── skills.sh.json
├── mlops/vllm-deploy/SKILL.md
├── research/paper-summarizer/SKILL.md
└── README.mdTeam deployment commands:
hermes skills tap add github:your-org/your-skills-tap
hermes skills tap add github:your-org/private-skills --token $GH_TOKEN
hermes skills tap update
hermes skills tap listVersioning hygiene: track ~/.hermes/skills/ in git (or a dedicated tap repo), tag releases, and document breaking changes in the Tap README. Cross-device sync becomes git pull && hermes skills reset instead of Slack file dumps. Private orgs should use deploy tokens or fine-grained PATs scoped to the tap repo only—never commit tokens into skill frontmatter.
10 · Self-evolving Skills: GEPA + DSPy
GEPA (Genetic-Pareto Prompt Evolution) is an ICLR 2026 Oral result integrated in hermes-agent-self-evolution. Instead of fine-tuning model weights, GEPA analyzes execution traces, generates SKILL.md variants, and runs multi-objective Pareto selection to improve success rate, token efficiency, and latency simultaneously. Typical cost: $2–10 per evolution run using API inference—no GPU cluster required.
Five-stage pipeline:
- Trace collection — sessions stored in SQLite via Hermes's session DB.
- Reflective failure analysis — identify which procedure steps correlate with errors.
- Targeted mutation — generate 10–20 SKILL.md variants focused on weak sections.
- Pareto evaluation — score variants on success × token efficiency × speed.
- Human review — merge winning diff via PR after automated guardrails pass.
git clone https://github.com/NousResearch/hermes-agent-self-evolution
export HERMES_AGENT_PATH=~/.hermes
python -m evolution.skills.evolve_skill \
--skill github-code-review \
--iterations 10 \
--eval-source sessiondbFour safety guardrails block reckless merges: full test suite must pass at 100%; Skills stay ≤15 KB and tool descriptions ≤500 characters; prompt-cache compatibility is preserved; semantic preservation checks ensure the skill still means what your team thinks it means. Roadmap Phase 1 (SKILL.md evolution) is production-ready; Phases 2–5 extend to tool descriptions, system prompts, tool implementation code, and fully automated loops.
Experimental mode accepts mixed trace sources—feed Claude Code or Gemini CLI logs alongside Hermes sessions:
--eval-source mixed --trace-dirs ~/.claude/traces,~/.hermes/sessionsThat cross-runtime learning is powerful for teams that prototype in Cursor but deploy agents on Hermes gateway hardware. Capture traces on a disposable rented Mac, run GEPA overnight, review the PR on Monday—without touching production SKILL.md until tests green-light the diff.
11 · Plugin skills: extending Hermes boundaries
Plugins namespace skills as plugin:skill. They do not appear in the default skills_list; the user opts in explicitly—useful for experimental or high-risk capabilities you do not want auto-routed.
skill_view("superpowers:writing-plans")
# plugin.yaml
name: my-hermes-plugin
skills:
- name: writing-plans
path: skills/writing-plans/SKILL.mdPlugins pair well with internal tools that should never surface during casual chat—administration runbooks, production database skills, or compliance workflows that require an explicit slash invocation and audit log entry.
12 · Advanced authoring tips (engineer's checklist)
- Description drives routing. State trigger conditions and exclusion cases. "Helps with code" activates everywhere and nowhere useful.
- Pitfalls separate good from great. Document concrete failure modes—GitHub API rate limits, oversized diffs blowing token budgets—with root cause and fix steps the agent can follow without guessing.
- Script with fallbacks. Reference
scripts/in Procedure; on failure, point toreferences/manual-extract.mdfor manual recovery. - Size discipline. Under 500 lines: keep in SKILL.md. 500–1000 lines: split references. Above 15 KB: mandatory split for GEPA compatibility.
- Agent writes need approval. Hermes can patch skills via
skill_manage(action='patch'|'create'); enableagent_writes_require_approval: truein config.yaml so autonomous edits do not silently overwrite reviewed procedures.
13 · Case study: technical blog workflow Skills
MacDate's own blog pipeline maps cleanly onto a Bundle. Below is a representative YAML you can adapt—swap platform-specific publish steps for your CMS.
name: blog-workflow
description: Full tech blog writing workflow.
skills:
- seo-keyword-research
- outline-generator
- code-example-validator
- bilingual-checker
- publish-to-platform
instruction: |
Always research SEO keywords before writing.
Ensure all code examples are tested and runnable.A custom seo-keyword-research skill can emit a bilingual keyword matrix before drafting—three to five head terms plus ten to fifteen long-tail phrases cross-checked against Dev.to trending, Hacker News front page, and niche community feeds. The code-example-validator skill runs bundled scripts against the rented Mac sandbox so copy-pasted shell snippets actually execute on Apple Silicon before publication.
The instruction block enforces editorial policy without duplicating it across five skills: research first, runnable examples only, no publish until validation passes. That is Bundles doing governance work that would otherwise live in a Notion doc nobody reads.
14 · FAQ
Q: How do Skills differ from MCP?
Skills are procedural knowledge documents—they teach the agent how to approach a task. MCP (Model Context Protocol) is a tool interface—it gives the agent live access to external systems. They compose: a Skill can say "call the Jira MCP tool, then apply the escalation template below."
Q: I edited a Skill but the agent still uses the old version.
Changes do not apply mid-session. Run /reset or reinstall with --now (note: --now invalidates prompt cache on supported providers).
Q: Is GEPA-evolved skill content safe to merge?
Automated guardrails catch size, test, and semantic regressions—but human PR review remains mandatory. Read every diff; GEPA optimizes metrics, not your compliance officer's sleep.
Q: Can I reuse Hermes skills in Claude Code?
Copy SKILL.md into ~/.claude/skills/ or use kevinnft/ai-agent-skills for multi-runtime installs. Verify tool availability differs per runtime.
Q: Does Chinese content hurt token efficiency?
Chinese characters run roughly 1–1.5 tokens per character—comparable to English per semantic unit. For LLM routing precision, keep descriptions in English or bilingual; body content can match your audience locale.
Further reading: Hermes official docs, our Cursor Agent Skill guide, 30-day Hermes field review, and memory + hardware selection guide.
15 · Five-step Mac rental sandbox for Hermes Skills
Hermes Gateway runs on Linux VPS and Windows, but macOS-only skills—Xcode workflows, Keychain operations, Apple codesigning, Homebrew recipes tuned for Apple Silicon—need real macOS. The pragmatic 2026 pattern: spin up a disposable rented Mac, validate Skills / Bundles / GEPA traces, then release the node before monthly charges accrue.
- Rent an Apple Silicon node. Choose Mac mini M4 or better with Homebrew preinstalled; SSH in from your laptop. Day rates and SKUs live on bare-metal macOS pricing.
- Install Hermes and run doctor. Follow the official install guide; confirm gateway health with
hermes doctorand verify toolsets match your production target. - Install official skills and custom Taps. Run
hermes skills installandhermes skills tap add; measure Level 0 vs Level 1 token footprint on a representative session. - Author a Bundle and execute the workflow. Write YAML under
~/.hermes/skill-bundles/; trigger with/bundle-nameand confirm all listed skills load and follow the instruction block. - Capture traces and release. Export session logs for GEPA if evolving skills; save terminal output as acceptance evidence; terminate the rental when validation passes to stop billing.
Linux VPS hosting suits lightweight API-only gateways but cannot validate macOS-exclusive skills or local Keychain permission flows. Running 7×24 on a personal laptop risks thermal throttling, polluted dotfiles, and API keys on a machine you also use for email. Daily Mac rental delivers production-faithful Apple Silicon for less than the cost of one misconfigured skill firing a runaway tool loop overnight.
Most Skills validation sprints finish in one to three rental days on Mac mini M4 16 GB—enough for Tap installs, Bundle authoring, and a GEPA iteration batch without CapEx. If the gateway proves stable enough for 7×24 Telegram duty, graduate to monthly rental using the hardware matrix in our VPS vs Mac mini selection guide. SSH setup, VNC fallback, and billing mechanics are covered in the daily Mac rental FAQ.
That is why teams treating Hermes Skills as production infrastructure—not weekend experiments—rent isolated Apple Silicon nodes from MacDate instead of contaminating daily-driver laptops or guessing on Linux-only sandboxes. You get native Keychain and codesign behavior, a clean ~/.hermes/ tree per sprint, SSH access for scripted validation, and daily billing that stops when QA ends. Compare tiers on bare-metal macOS pricing; most Bundle and GEPA trials complete before you need a second week of rent.