Hermes Deep Dive 2026-06-18

2026 Hermes Agent Skills
Advanced Guide
From SKILL.md to GEPA Evolution

You installed Hermes Agent, ran hermes doctor, and still reach for the same copy-pasted prompts every sprint. That gap between "agent is running" and "agent knows how we work" is exactly what the Skills system closes. This guide goes past first-boot setup—see our Hermes install walkthrough for that—and into the mechanics that separate hobby deployments from production workflows: Progressive Disclosure for token economics, Skill Bundles for multi-step pipelines, conditional activation for environment-aware routing, Tap publishing for team distribution, and GEPA + DSPy for skills that improve from execution traces rather than model fine-tuning.

2026 Hermes Agent Skills advanced guide covering SKILL.md, GEPA evolution, and Skill Bundles

01 · Why Hermes Agent's Skills system deserves its own deep dive

In early 2026, Nous Research shipped Hermes Agent with a tagline that stuck: "the agent that grows with you." Within two months the project crossed 160,000 GitHub stars—one of the fastest-growing open-source agent stacks on record. The headline feature is not a bigger base model; it is a portable, versionable layer of procedural memory called Skills.

Unlike one-shot system prompts that vanish when the session ends, Hermes Skills are standardized documents the agent loads on demand, shares across sessions, and—when you wire up GEPA—refines from real execution data. If you already run the gateway on a VPS or a rented Mac mini, you have the runtime. This article explains how to make that runtime actually reflect how your team ships software, writes docs, and debugs production incidents.

We skip install basics and focus on the advanced surface: token-aware loading tiers, YAML Bundles that fire entire workflows with one slash command, metadata-driven conditional visibility, community Tap distribution, and the Genetic-Pareto evolution pipeline that treats SKILL.md as mutable text rather than frozen lore.

02 · Three pain points: installing Hermes is not the same as mastering Skills

  1. Runaway token spend. Teams dump entire SOPs into the system prompt and pay thousands of tokens every session. Without Progressive Disclosure, loading the full body of fifty skills can blow the context window before the user asks a question.
  2. Imprecise skill activation. Vague description fields cause the LLM to mount the wrong skill—or miss the right one entirely. Without conditional activation, switching between free DuckDuckGo search and paid Brave/Firecrawl APIs means hand-editing config instead of letting metadata hide irrelevant skills automatically.
  3. Knowledge that never compounds. Personal prompts live in chat history, not git. There is no Tap for teammates to subscribe to, no validation hook, and no GEPA loop to turn failed traces into improved procedures. Skills stagnate while the model API bill climbs.

Each pain point maps to a feature section below. Fixing all three is what separates a demo gateway from an agent your org actually trusts on Friday deploys.

03 · Core concepts: Skills ≠ Prompts, Skills ≠ Memory

Hermes exposes three overlapping but distinct context channels. Conflating them is the most common architectural mistake we see in first deployments.

Dimension Plain Prompt Memory Skills
PersistenceCurrent chat onlyCross-session, permanentCross-session, permanent
Load timingAlways in contextInjected each sessionOn demand
Token costEvery turnSmall, stable footprintZero until activated
Content typeAny intent textUser prefs / factsProcedural steps
ShareabilityAwkwardPrivate by defaultPublishable as community Tap

Mnemonic: Prompt = sticky note (valid this conversation). Memory = notebook on your desk (always nearby). Skill = SOP manual (pull from shelf when the task matches). For Cursor-specific parallels—Rules vs Skills vs MCP—see our 2026 Agent Skill complete guide.

04 · SKILL.md format deep dive (agentskills.io open standard)

Every Hermes Skill follows the agentskills.io specification so the same folder works in Hermes, Claude Code, and Cursor without rewrite. That portability is strategic: author once, validate on a rented Mac, deploy to whichever agent your team standardizes on next quarter.

--- name: my-skill description: | Use when the user needs to [...]. Handles [...] and [...]. version: 1.0.0 license: MIT compatibility: Requires git, docker allowed-tools: Bash(git:*) Read metadata: hermes: tags: [devops, automation] category: software-development related_skills: [github-pr-workflow, test-driven-development] requires_toolsets: [terminal] fallback_for_toolsets: [web] --- # My Skill Title ## Overview ## When to Use ## Procedure ## Common Pitfalls ## Verification Checklist

Critical fields: name is required—lowercase letters and hyphens, max 64 characters. description is required—max 1024 characters—and should open with "Use when…" because Level 0 routing sees only name + description. Put Hermes-specific routing in metadata.hermes: tags, categories, toolset requirements, and fallback rules covered in section 07.

Modular directory layout

Keep the main file lean; push reference material into subfolders the agent loads only when executing.

~/.hermes/skills/ └── my-category/ └── my-skill/ ├── SKILL.md # Main file (target ≤500 lines) ├── references/ # API docs — loaded on demand ├── templates/ # Reusable output scaffolds └── scripts/ # Executable helpers the agent can run

The 500-line guideline is not cosmetic. GEPA's safety rails reject Skills above 15 KB, and bloated main files defeat Progressive Disclosure—you pay Level 1 tokens for content that should live at Level 2.

05 · Progressive Disclosure: three-level loading

Progressive Disclosure is Hermes's answer to the "fifty skills ate my context" problem. The gateway never dumps every SKILL.md body into the prompt at session start.

Tier Content loaded Trigger Token cost
Level 0name + description onlyEvery session start~3K total across all skills
Level 1Full SKILL.md body/skill-name or LLM routingDepends on file length
Level 2references/, scripts/LLM decides during executionPer file, on demand

Authoring implication: invest disproportionate effort in the description—when to use, when not to use, product names, error strings users paste from Slack. Move API tables and long examples into references/. Teams running thirty-plus skills report that disciplined Level 0 descriptions cut mistaken activations by half compared to generic "helps with code" summaries.

Level 2 is where script-heavy skills shine: the agent reads the procedure in Level 1, then pulls only the one script or reference file needed for the current sub-step. That pattern mirrors how senior engineers work—they do not memorize every man page; they open the relevant page when executing.

06 · Skill Bundles: one command, full workflow

Skill Bundles arrived in Hermes 2026 as a first-class workflow primitive. A Bundle is a lightweight YAML file listing multiple skills that load simultaneously when the user types /bundle-name. Think of it as a curated playlist for agent context—not a new prompt layer, but a coordinated multi-skill mount.

File location: ~/.hermes/skill-bundles/<slug>.yaml

name: backend-dev description: | Full backend feature workflow — code review, TDD, and PR management. skills: - github-code-review - test-driven-development - github-pr-workflow instruction: | Always write failing tests first before implementation. Never push directly to main.

Priority rules worth memorizing:

  • If a Bundle and a single Skill share the same name, the Bundle wins.
  • Skills listed but not installed are skipped silently—no error spam.
  • Bundles do not rewrite the system prompt, so they preserve prompt-cache efficiency on providers that support caching.

CLI shortcut to scaffold:

hermes bundles create backend-dev \ --skills github-code-review,test-driven-development,github-pr-workflow \ --instruction "Always write failing tests first"

Common Bundle recipes in the wild include an AI researcher stack (arxiv + deep-research + plan + excalidraw) and an MLOps deploy pipeline (vllm + llama-cpp + github-pr-workflow + systematic-debugging). The instruction block is your team's non-negotiables—branch protection rules, test ordering, security review gates—without duplicating them inside every constituent skill.

07 · Conditional activation: environment-aware skills

Skills can auto-hide or auto-show based on which tools and toolsets exist in the current session. Configure this under metadata.hermes so the Level 0 skill list reflects reality—free search when paid APIs are absent, terminal skills when SSH is unavailable, web fallbacks when headless mode is active.

Field Behavior
requires_toolsetsHide skill if listed toolsets are missing
requires_toolsHide skill if listed tools are missing
fallback_for_toolsetsHide when listed toolsets are present (backup path)
fallback_for_toolsHide when listed tools are present (backup path)

Canonical example: a DuckDuckGo search skill sets fallback_for_tools: [web_search]. When the user configures FIRECRAWL_KEY or BRAVE_SEARCH_KEY, the paid web_search tool activates and DuckDuckGo hides—saving tokens and avoiding duplicate search strategies. If the API key expires, the fallback skill resurfaces without a config edit. That is conditional activation doing policy work your ops team would otherwise encode in runbooks.

08 · Skills Hub and the open-source ecosystem

Hermes ships multiple install channels so you are not locked to a single registry:

hermes skills install official/research/arxiv hermes skills install https://example.com/SKILL.md --name my-skill hermes skills install github:openai/skills/k8s hermes skills tap add github:my-org/my-skills

Community curators maintain production-grade collections. The table below highlights repos worth bookmarking—not exhaustive, but representative of what teams actually pull into CI sandboxes.

Repository Highlight Stars
ChuckSRQ/awesome-hermes-skillsProduction bundles incl. Deep Research, MLOps67+
amanning3390/hermeshubCommunity registry with prompt-injection checks166+
kevinnft/ai-agent-skills191 skills, cross Hermes / Claude / Cursor10+
NousResearch/hermes-agentOfficial source of truth160k+

Validate before you trust third-party skills: skills-ref validate ./my-skill checks agentskills.io compliance. Skill assets are plain files in git—they do not bind you to Hermes forever. That portability is why many teams mirror skills into internal repos alongside application code.

09 · Publishing your own Skill Tap: team and community distribution

A Tap is a GitHub repository that acts as a subscription feed for skills. Add it once; every teammate pulls updates with hermes skills tap update. Structure:

my-skills-tap/ ├── skills.sh.json ├── mlops/vllm-deploy/SKILL.md ├── research/paper-summarizer/SKILL.md └── README.md

Team deployment commands:

hermes skills tap add github:your-org/your-skills-tap hermes skills tap add github:your-org/private-skills --token $GH_TOKEN hermes skills tap update hermes skills tap list

Versioning hygiene: track ~/.hermes/skills/ in git (or a dedicated tap repo), tag releases, and document breaking changes in the Tap README. Cross-device sync becomes git pull && hermes skills reset instead of Slack file dumps. Private orgs should use deploy tokens or fine-grained PATs scoped to the tap repo only—never commit tokens into skill frontmatter.

10 · Self-evolving Skills: GEPA + DSPy

GEPA (Genetic-Pareto Prompt Evolution) is an ICLR 2026 Oral result integrated in hermes-agent-self-evolution. Instead of fine-tuning model weights, GEPA analyzes execution traces, generates SKILL.md variants, and runs multi-objective Pareto selection to improve success rate, token efficiency, and latency simultaneously. Typical cost: $2–10 per evolution run using API inference—no GPU cluster required.

Five-stage pipeline:

  1. Trace collection — sessions stored in SQLite via Hermes's session DB.
  2. Reflective failure analysis — identify which procedure steps correlate with errors.
  3. Targeted mutation — generate 10–20 SKILL.md variants focused on weak sections.
  4. Pareto evaluation — score variants on success × token efficiency × speed.
  5. Human review — merge winning diff via PR after automated guardrails pass.
git clone https://github.com/NousResearch/hermes-agent-self-evolution export HERMES_AGENT_PATH=~/.hermes python -m evolution.skills.evolve_skill \ --skill github-code-review \ --iterations 10 \ --eval-source sessiondb

Four safety guardrails block reckless merges: full test suite must pass at 100%; Skills stay ≤15 KB and tool descriptions ≤500 characters; prompt-cache compatibility is preserved; semantic preservation checks ensure the skill still means what your team thinks it means. Roadmap Phase 1 (SKILL.md evolution) is production-ready; Phases 2–5 extend to tool descriptions, system prompts, tool implementation code, and fully automated loops.

Experimental mode accepts mixed trace sources—feed Claude Code or Gemini CLI logs alongside Hermes sessions:

--eval-source mixed --trace-dirs ~/.claude/traces,~/.hermes/sessions

That cross-runtime learning is powerful for teams that prototype in Cursor but deploy agents on Hermes gateway hardware. Capture traces on a disposable rented Mac, run GEPA overnight, review the PR on Monday—without touching production SKILL.md until tests green-light the diff.

11 · Plugin skills: extending Hermes boundaries

Plugins namespace skills as plugin:skill. They do not appear in the default skills_list; the user opts in explicitly—useful for experimental or high-risk capabilities you do not want auto-routed.

skill_view("superpowers:writing-plans") # plugin.yaml name: my-hermes-plugin skills: - name: writing-plans path: skills/writing-plans/SKILL.md

Plugins pair well with internal tools that should never surface during casual chat—administration runbooks, production database skills, or compliance workflows that require an explicit slash invocation and audit log entry.

12 · Advanced authoring tips (engineer's checklist)

  • Description drives routing. State trigger conditions and exclusion cases. "Helps with code" activates everywhere and nowhere useful.
  • Pitfalls separate good from great. Document concrete failure modes—GitHub API rate limits, oversized diffs blowing token budgets—with root cause and fix steps the agent can follow without guessing.
  • Script with fallbacks. Reference scripts/ in Procedure; on failure, point to references/manual-extract.md for manual recovery.
  • Size discipline. Under 500 lines: keep in SKILL.md. 500–1000 lines: split references. Above 15 KB: mandatory split for GEPA compatibility.
  • Agent writes need approval. Hermes can patch skills via skill_manage(action='patch'|'create'); enable agent_writes_require_approval: true in config.yaml so autonomous edits do not silently overwrite reviewed procedures.

13 · Case study: technical blog workflow Skills

MacDate's own blog pipeline maps cleanly onto a Bundle. Below is a representative YAML you can adapt—swap platform-specific publish steps for your CMS.

name: blog-workflow description: Full tech blog writing workflow. skills: - seo-keyword-research - outline-generator - code-example-validator - bilingual-checker - publish-to-platform instruction: | Always research SEO keywords before writing. Ensure all code examples are tested and runnable.

A custom seo-keyword-research skill can emit a bilingual keyword matrix before drafting—three to five head terms plus ten to fifteen long-tail phrases cross-checked against Dev.to trending, Hacker News front page, and niche community feeds. The code-example-validator skill runs bundled scripts against the rented Mac sandbox so copy-pasted shell snippets actually execute on Apple Silicon before publication.

The instruction block enforces editorial policy without duplicating it across five skills: research first, runnable examples only, no publish until validation passes. That is Bundles doing governance work that would otherwise live in a Notion doc nobody reads.

14 · FAQ

Q: How do Skills differ from MCP?
Skills are procedural knowledge documents—they teach the agent how to approach a task. MCP (Model Context Protocol) is a tool interface—it gives the agent live access to external systems. They compose: a Skill can say "call the Jira MCP tool, then apply the escalation template below."

Q: I edited a Skill but the agent still uses the old version.
Changes do not apply mid-session. Run /reset or reinstall with --now (note: --now invalidates prompt cache on supported providers).

Q: Is GEPA-evolved skill content safe to merge?
Automated guardrails catch size, test, and semantic regressions—but human PR review remains mandatory. Read every diff; GEPA optimizes metrics, not your compliance officer's sleep.

Q: Can I reuse Hermes skills in Claude Code?
Copy SKILL.md into ~/.claude/skills/ or use kevinnft/ai-agent-skills for multi-runtime installs. Verify tool availability differs per runtime.

Q: Does Chinese content hurt token efficiency?
Chinese characters run roughly 1–1.5 tokens per character—comparable to English per semantic unit. For LLM routing precision, keep descriptions in English or bilingual; body content can match your audience locale.

Further reading: Hermes official docs, our Cursor Agent Skill guide, 30-day Hermes field review, and memory + hardware selection guide.

15 · Five-step Mac rental sandbox for Hermes Skills

Hermes Gateway runs on Linux VPS and Windows, but macOS-only skills—Xcode workflows, Keychain operations, Apple codesigning, Homebrew recipes tuned for Apple Silicon—need real macOS. The pragmatic 2026 pattern: spin up a disposable rented Mac, validate Skills / Bundles / GEPA traces, then release the node before monthly charges accrue.

  1. Rent an Apple Silicon node. Choose Mac mini M4 or better with Homebrew preinstalled; SSH in from your laptop. Day rates and SKUs live on bare-metal macOS pricing.
  2. Install Hermes and run doctor. Follow the official install guide; confirm gateway health with hermes doctor and verify toolsets match your production target.
  3. Install official skills and custom Taps. Run hermes skills install and hermes skills tap add; measure Level 0 vs Level 1 token footprint on a representative session.
  4. Author a Bundle and execute the workflow. Write YAML under ~/.hermes/skill-bundles/; trigger with /bundle-name and confirm all listed skills load and follow the instruction block.
  5. Capture traces and release. Export session logs for GEPA if evolving skills; save terminal output as acceptance evidence; terminate the rental when validation passes to stop billing.

Linux VPS hosting suits lightweight API-only gateways but cannot validate macOS-exclusive skills or local Keychain permission flows. Running 7×24 on a personal laptop risks thermal throttling, polluted dotfiles, and API keys on a machine you also use for email. Daily Mac rental delivers production-faithful Apple Silicon for less than the cost of one misconfigured skill firing a runaway tool loop overnight.

Most Skills validation sprints finish in one to three rental days on Mac mini M4 16 GB—enough for Tap installs, Bundle authoring, and a GEPA iteration batch without CapEx. If the gateway proves stable enough for 7×24 Telegram duty, graduate to monthly rental using the hardware matrix in our VPS vs Mac mini selection guide. SSH setup, VNC fallback, and billing mechanics are covered in the daily Mac rental FAQ.

That is why teams treating Hermes Skills as production infrastructure—not weekend experiments—rent isolated Apple Silicon nodes from MacDate instead of contaminating daily-driver laptops or guessing on Linux-only sandboxes. You get native Keychain and codesign behavior, a clean ~/.hermes/ tree per sprint, SSH access for scripted validation, and daily billing that stops when QA ends. Compare tiers on bare-metal macOS pricing; most Bundle and GEPA trials complete before you need a second week of rent.