Table of Contents

01 · Why billing data beats CLI hype

GitHub READMEs promise "autonomous coding." OpenRouter's CLI client rankings show what developers actually route through paid API keys when nobody is watching. A terminal agent that burns tokens on failed tool loops still counts toward the weekly leaderboard—and your invoice. That makes the CLI filter on openrouter.ai/rankings a sharper signal than any Product Hunt launch.

OpenRouter aggregates 300+ models from 60+ providers, processing on the order of 100 trillion tokens per month across more than 8 million users. When Kilo Code climbs to the CLI #1 slot, it means thousands of independent teams chose that client as their default gateway—not that a single vendor ran a promotion. This complements our earlier analysis of weekly model token rankings, which focused on which models absorb traffic; here we focus on which CLI clients generate it.

The distinction matters because model routing and client choice are separate decisions. You might route through DeepSeek V4 Flash for cost while using Kilo Code as the orchestration layer—or run Claude Code against Anthropic's native API with zero OpenRouter involvement. The CLI leaderboard captures the first pattern: clients that explicitly proxy through OpenRouter's unified endpoint and therefore appear in platform telemetry.

02 · Data source and methodology

OpenRouter publishes rolling 7-day statistics filterable by client application. We anchor this article to the window of June 2 through June 8, 2026 (UTC). The CLI category counts input plus output tokens attributed to each registered client ID in OpenRouter's application registry.

Key figures cited throughout:

Hermes Agent: approximately 4.94 trillion tokens in the same week—ranked #1 across all OpenRouter clients, not only CLI tools. This includes Telegram bot sessions, persistent memory loops, and multi-channel gateway traffic documented in our Hermes Agent deployment guide.
Kilo Code: approximately 1.22 trillion tokens—the highest pure terminal-coding client in the CLI-specific filter for the week.
Claude Code: approximately 606 billion tokens—solid #2 in CLI category despite Anthropic's parallel first-party API path.

Rankings below #3 fall off quickly in absolute volume but remain strategically important because they represent different architectural bets: patch-only editors (Aider), IDE extensions (Cline), enterprise sandboxes (Goose), and vendor-native CLIs (Codex CLI, Qwen Code). Treat the board as a directional signal refreshed weekly, not a permanent hierarchy.

03 · Three adoption pain points

1. Conflating client popularity with model quality. Kilo Code's 1.22T weekly volume reflects both its UX and the default models its users select—often cost-optimized MoE routes from the model leaderboard we covered in 2026 LLM trends from OpenRouter rankings. Switching to Kilo because it tops the CLI chart without auditing your model ID is how teams accidentally route bulk cron jobs through Sonnet-tier pricing.

2. Installing three CLIs on a production Mac. Each terminal agent wants global npm permissions, shell profile hooks, and sometimes conflicting Node versions. Kilo, Cline, and Goose side-by-side on your daily-driver MacBook pollute Keychain entries, leave orphaned API keys in ~/.config, and make forensic cleanup painful when a trial goes wrong. The safer pattern—mirroring our Agent Skill isolation workflow—is to trial on a disposable macOS node, then promote one winner.

3. Ignoring hardware constraints for sandboxed agents. Goose and OpenCode spin Docker containers for tool execution. Codex CLI may invoke local sandboxes on Apple Silicon. A MacBook Air with 8 GB RAM will thrash under parallel agent loops while a Mac Studio with 64 GB runs Ollama fallback and Docker sidecars without swap pressure. Matching CLI ambition to silicon tier prevents false negatives during evaluation.

04 · CLI Top 10 for the week of June 2–8, 2026

The table below reflects OpenRouter's CLI client filter for the anchor week. Volumes are rounded; visit the live rankings page for current numbers.

Rank	CLI Client	Weekly Tokens	Category	Notable trait
1	Kilo Code	~1.22T	Terminal IDE	Multi-model routing; VS Code–like TUI; OpenRouter-native
2	Claude Code	~606B	Anthropic CLI	Deep tool-use loops; strong on long refactors
3	Hermes Agent	~4.94T*	Multi-channel agent	Platform #1 overall; Telegram, memory, skills
4	Aider	~380B (est.)	Git-native patcher	Minimal UI; repo-map context
5	Cline	~290B (est.)	VS Code extension + CLI	Human-in-the-loop approvals
6	Goose	~210B (est.)	Block / enterprise	Recipe system; Docker sandbox
7	OpenCode	~175B (est.)	Open-source terminal	Provider-agnostic; rapid iteration
8	Codex CLI	~140B (est.)	OpenAI native	Tight GPT-5.x integration
9	Roo Code	~95B (est.)	VS Code fork	Mode switching (architect/code/debug)
10	Qwen Code	~80B (est.)	Alibaba native	Optimized for Qwen3 coder routes

*Hermes Agent's 4.94T figure spans all client channels on OpenRouter, not only terminal sessions. It appears in the CLI top ten because a significant share of its traffic routes through CLI and gateway interfaces rather than browser chat alone.

The gap between #1 Kilo (1.22T) and #2 Claude Code (606B) is roughly 2×—material, but not insurmountable if Anthropic ships a pricing change or bundles Claude Code with enterprise seats. The gap between Kilo and #4 Aider (~380B estimated) is wider, suggesting the market prefers full terminal IDEs over patch-only workflows for OpenRouter-routed workloads.

05 · Feature comparison matrix

Token volume tells you who is winning today. Features tell you who fits your workflow tomorrow.

Client	OpenRouter native	Multi-model switch	Tool / sandbox	IDE integration	Best fit
Kilo Code	Yes	Yes (hot-swap)	Shell, browser, MCP	Standalone TUI	Polyglot teams routing by task cost
Claude Code	Partial	Anthropic only*	Advanced tool use	Terminal + IDE plugins	Long autonomous refactors
Hermes Agent	Yes	Yes	Skills, cron, Telegram	Headless / gateway	Always-on ops bots
Aider	Yes	Yes	Git diff only	Editor agnostic	Minimalists; CI hooks
Cline	Yes	Yes	Browser, terminal	VS Code core	Teams wanting approval gates
Goose	Yes	Yes	Docker recipes	CLI + desktop	Regulated sandbox needs
OpenCode	Yes	Yes	Configurable	Terminal	Hackers customizing providers
Codex CLI	No (OpenAI direct)	GPT family	Cloud sandbox	Terminal	OpenAI-all-in shops
Roo Code	Yes	Yes	Multi-mode agents	VS Code fork	Structured role separation
Qwen Code	Partial	Qwen primary	Standard tools	Terminal	APAC Qwen deployments

*Claude Code can reach OpenRouter via manual base-URL configuration, but most weekly volume flows through Anthropic's first-party endpoint and appears in CLI rankings only when explicitly proxied.

Two patterns emerge from the matrix. First, OpenRouter-native polyglot clients (Kilo, Hermes, Cline) dominate token share because teams mix models weekly as the model leaderboard shifts. Second, vendor-locked CLIs (Claude Code, Codex CLI, Qwen Code) trade flexibility for tighter integration—and still place top ten because enterprise procurement often mandates a single vendor relationship.

06 · Kilo Code vs Claude Code vs Hermes Agent

Kilo Code — volume king of terminal coding

At 1.22 trillion tokens for the week of June 2–8, Kilo Code is the clearest signal that developers want a terminal-native IDE with instant model switching. Its architecture treats OpenRouter as a dial: point at DeepSeek V4 Flash for inner loops, promote to Claude Sonnet 4.6 for merge-ready diffs, drop to free-tier models for spikes. That aligns with the tiered routing philosophy from our billing-truth article—except the client itself becomes the routing layer.

Kilo's weakness is operational surface area. More models means more failure modes: context overflow on one provider, tool-schema mismatch on another, rate-limit cascades when a cron job hot-swaps mid-run. Teams without budget caps have reported invoice surprises when a default model ID changed between Kilo releases.

Claude Code — quality floor at 606B tokens

606 billion weekly tokens is not #1, but it is remarkable for a client tightly coupled to Anthropic's stack. Claude Code earns its rank on long-horizon tasks: multi-file refactors, test generation across modules, and tool chains that tolerate higher latency in exchange for fewer rollback events. Platform leads often keep Claude Code as the escalation path while routing bulk work through Kilo plus cheap MoE models.

The trade-off is cost. Even when proxied through OpenRouter, Claude-tier output pricing dwarfs DeepSeek routes. A team running Claude Code as default for CI bots will show healthy CLI rankings and unhealthy finance reviews.

Hermes Agent — 4.94T and platform #1

Hermes is the outlier. Its 4.94 trillion token week—not CLI-filtered alone—reflects persistent agents that never sleep: Telegram ops bots, scheduled skill invocations, memory-consolidation passes, and gateway traffic from OpenClaw-compatible setups. Hermes behaves less like a coding IDE and more like infrastructure that happens to call LLMs continuously.

If your use case is "developer edits Swift for four hours," Hermes is the wrong comparison. If your use case is "always-on SRE bot that reads logs, opens PRs, and pings on-call," Hermes's weekly volume explains why it tops the entire OpenRouter client board while sitting #3 in the CLI-specific filter. Deployment patterns are covered in our 30-day Hermes review on rented Mac mini M4.

07 · Why CLI rankings matter in June 2026

Three structural shifts make CLI client telemetry unavoidable for platform decisions.

First, programming traffic crossed 50% of categorized OpenRouter usage in 2026—up from roughly 11% in early 2025. Terminal agents are no longer a niche; they are the majority use case. Second, client consolidation is accelerating: the gap between #1 and #10 in the CLI board is smaller in relative terms than the model board, meaning tooling choices flip faster as UX improves. Third, enterprise procurement now asks for client audit trails—which CLI touched production repos, which API keys it stored, whether it sandboxed shell execution.

Investors and competitors watch the same public rankings. When Kilo overtakes Claude Code in weekly CLI tokens, it signals a generational preference for model-agnostic terminals over vendor bundles—regardless of whether Anthropic's models still win on quality benchmarks.

08 · Mac rental hardware tiers for CLI trials

CLI agents stress CPU, RAM, and disk differently than browser chat. Match rental tier to your evaluation scope.

Tier	Hardware	RAM	Ideal CLI workload	Limitations
Light	MacBook Air M2 / M3	16 GB	Aider, Kilo single-repo, OpenCode smoke tests	No Docker; avoid parallel agents
Medium	MacBook Pro M3	16–32 GB	Kilo + Cline, Claude Code refactors, 2 concurrent loops	Local Ollama marginal at 16 GB
Heavy	MacBook Pro M4 Pro / M4 Max	32 GB+	Goose Docker recipes, sandboxed Codex-style runs	Studio-class local models still impractical
Lab	Mac Studio M4 Ultra	64 GB+	Hermes + local Ollama fallback, parallel gateway tests	Overkill for patch-only trials

Pricing and availability live on bare-metal macOS pricing. For SSH setup, VNC fallback, and day-rent billing mechanics, see the daily Mac rental FAQ. Most CLI shootouts complete in one to three rental days on a Mac mini M4 16 GB—convert CapEx curiosity into OpEx evidence before buying a Studio for Hermes plus Ollama.

09 · Scenario selection matrix

Scenario	Primary CLI	Fallback	Mac tier	Why rankings agree
Polyglot model routing (weekly model churn)	Kilo Code	OpenCode	Medium MBP M3 32 GB	CLI #1 at 1.22T; hot-swap without restart
12-hour autonomous refactor	Claude Code	Kilo + Sonnet route	Medium	606B weekly tokens; quality escalation path
Always-on ops / Telegram bot	Hermes Agent	OpenClaw gateway	Lab Studio 64 GB+	4.94T platform #1; persistent memory
Minimal git-only patches	Aider	Cline	Light Air M3 16 GB	Lower token volume; fast CI integration
Regulated shell sandbox	Goose	Codex CLI	Heavy M4 Pro 32 GB+	Docker isolation; enterprise audit
Human approval on every diff	Cline	Roo Code	Medium	IDE-native review gates

Revisit this matrix every Monday alongside the model leaderboard. CLI clients lag model shifts by one to two weeks—Kilo's volume spike in early June followed DeepSeek V4 Flash's model-board dominance in late May. Synchronizing both boards prevents mismatched defaults.

10 · Five-step CLI trial on a rented Mac

Rent an isolated macOS node. Book Mac mini M4 or MacBook Pro M3 through MacDate; SSH in per the daily rental FAQ. Create a local user with no production Apple ID or signing certificates.
Install two CLI shortlist candidates. Example: Kilo Code plus Aider or Cline. Store OPENROUTER_API_KEY in a project-scoped .env—never export to global shell profiles on this box.
Run a fixed benchmark task. Choose a 12k-token coding task with at least three tool calls (read file, edit, run tests). Execute identically on each CLI. Log prompt tokens, completion tokens, wall time, and tool failures.
Cross-check OpenRouter CLI rankings. Open the rankings page CLI filter; confirm whether your chosen client's weekly volume trend matches your own usage intensity. Export OpenRouter usage CSV for dollar comparison.
Archive and release. Save cli-benchmark-YYYYMMDD.csv, revoke the test API key, uninstall CLIs, and wipe the rental per MacDate's return checklist.

                        # Rented Mac CLI trial — Kilo Code install (example)

                        curl -fsSL https://get.kilo.ai | bash

                        export OPENROUTER_API_KEY="sk-or-..."

                        kilo config set provider openrouter

                        kilo config set model deepseek/deepseek-v4-flash

                        # Fixed benchmark — same repo path for each CLI

                        kilo "Add unit tests for AuthService.swift and run xcodebuild test"

                        # Pull OpenRouter key usage for weekly comparison

                        curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" \

                          https://openrouter.ai/api/v1/auth/key | jq '.data.usage'

                        # Snapshot CLI rankings (manual): open https://openrouter.ai/rankings?category=cli

You could run the same installs on a personal MacBook, but that path collides with production Xcode signing identities, pollutes Keychain with experimental API keys, and leaves Hermes or Goose Docker volumes consuming disk between trials. A rented Mac is a forensic clean room: if a CLI misroutes Opus pricing into a cron loop, the blast radius ends when you release the instance. If Kilo's 1.22T weekly dominance convinces you to adopt it—but Claude Code's 606B figure reflects quality you still need for merges—you can run both on separate rental days without uninstall gymnastics.

For Apple-platform teams, the rented-Mac loop also keeps pace with toolchain churn: Swift 6 strict concurrency, Xcode 26 betas, and OpenClaw gateway updates ship faster than most CLI release notes. Testing agents on the same macOS build you ship against—not a Linux VPS with mismatched file paths—surfaces real failures before they hit CI. When your benchmark CSV shows a clear winner, promote one CLI to your team standard; until then, treat the OpenRouter board as the prior and your rental harness as the posterior.

OpenRouter CLI Tools Top 10
June 2026 Leaderboard