Server room racks and cabling representing voice collaboration infrastructure

2026 OpenClaw v2026.5.20 guide:
Discord realtime voice followUsers, xAI device-code OAuth, and Policy plugin doctor lint (day-rent macOS isolation)

When you upgrade a self-hosted OpenClaw Gateway to v2026.5.20, Discord voice may still feel broken even though text channels work, xAI login fails on a headless VPS because there is no localhost browser callback, and doctor suddenly surfaces Policy plugin lint you have never seen before. That combination is a release theme—not three unrelated incidents. This guide gives three pain clusters, a voice/auth/policy decision matrix, seven ordered steps, a triage table, three datapoints, and a 1–3 day rental cadence, with links to the install/deploy guide, Telegram/Discord pairing and allowlist troubleshooting, and the SSH/VNC FAQ.

01. Three pain clusters: voice follow, headless xAI OAuth, Policy lint

1) Discord voice “follows the wrong room” or never joins: v2026.5.20 ships voice.followUsers so the bot joins, moves, and leaves voice with configured Discord user IDs instead of waiting for /vc join or sitting in a fixed autoJoin channel. Operators who paste IDs from usernames instead of snowflakes, or who forget guild/channel allowlists, see the bot idle while DMs still work. Release notes also emphasize allowed-channel checks, multi-user handoff, bounded reconciliation, and DAVE recovery preservation—voice is no longer “join anywhere speech is heard.” Pair this with the Discord pairing and allowlist guide: spoken ingress now respects the same member and channel policy gates as text.

2) xAI auth breaks on VPS/SSH because OAuth expects a browser: Before 5.20, teams running Gateway on a Linux VPS often copied tokens from a laptop or ran brittle SSH port forwards to complete provider OAuth. The release adds device-code OAuth via openclaw models auth login --provider xai --device-code, which prints a verification URL and code you complete on any device while credentials land on the headless host. The failure mode shifts from “callback never hits localhost” to “code expired because nobody watched the terminal” or “wrong provider profile order after login.” Treat device-code as a change-window action with a human standing by—not something you fire in unattended CI.

3) Policy plugin doctor lint appears after upgrade: 5.20 bundles a Policy plugin for policy-backed channel conformance checks, doctor lint findings, and opt-in workspace repair. First-time readers interpret WARN lines as “Gateway broken.” In practice, lint often flags DM policies, allowlist gaps, plaintext secrets in openclaw.json, or channel configs that contradict your written security standard. Doctor also warns on sandbox tool policy hiding MCP tools and on symlinked credential files (fail-closed again in this release). Run doctor before you widen Discord voice to production guilds—Policy findings are cheaper to fix on a rental Mac than during a live standup.

These three surfaces interact: Discord realtime voice injects bounded IDENTITY.md, USER.md, and SOUL.md into provider instructions by default, so persona drift in voice is often a bootstrap file problem, not a model route problem. xAI device-code gets you a working inference path for voice turns routed to Grok-class models. Policy lint tells you whether the Discord channel you just opened for follow-users is actually conformant with your pairing and allowlist story.

02. Decision matrix: followUsers, bootstrapContextFiles, device-code, Policy plugin

Use the table during the change window as an acceptance sheet—not marketing copy. Archive the right-column evidence on a rental host before you wipe secrets.

Surface (5.20) Pass signal Rental evidence
voice.followUsers Bot joins when listed user enters an allowed voice channel; moves on channel change; leaves on disconnect Redacted voice-state log + channels status --channel discord JSON
voice.followUsersEnabled Defaults true when list configured; set false to pause without deleting IDs Screenshot of Control UI voice panel toggles
voice.realtime.bootstrapContextFiles Default IDENTITY/USER/SOUL injected; subset or [] disables; AGENTS.md stays in normal agent context Before/after transcript snippet showing persona grounding
xAI device-code OAuth models auth login --provider xai --device-code completes; model list shows xAI entries Redacted auth profile list; no API keys in config JSON
Policy plugin / doctor lint Conformance WARN resolved or explicitly waived with ticket; opt-in repair applied if chosen openclaw doctor stdout archived pre/post fix
Voice allowlist ingress Non-allowlisted member speech rejected before transcription Probe with allowlisted vs blocked test account

Choose join mode deliberately. Use followUsers when the bot should stay with operators who move between huddle rooms. Use autoJoin for fixed-room assistants that must be present even when no tracked user is in voice. Use /vc join for one-off sessions where automatic presence would surprise guests. Mixing all three without documenting precedence creates “ghost bot in empty channel” tickets.

bootstrapContextFiles: what changes in realtime voice

Realtime voice turns are fast and direct; they do not automatically replay your full workspace context. By default, 5.20 injects small profile files—IDENTITY.md (who the agent is), USER.md (who it serves), and SOUL.md (tone and boundaries)—into realtime provider instructions so spoken replies match your routed agent persona. Set voice.realtime.bootstrapContextFiles to a subset if you want leaner prompts, or [] to disable injection entirely. This does not replace openclaw_agent_consult for tool-backed workspace work, memory lookup, or current facts; it only grounds identity for low-latency voice. If operators complain the bot “sounds generic in voice but fine in text,” check bootstrap files before retraining prompts.

Policy plugin: lint vs repair

The bundled Policy plugin adds doctor-visible lint for channel conformance—think structured nudges when DM policy, guild allowlists, or channel maps disagree with declared security posture. Opt-in workspace repair can apply safe fixes when you explicitly accept them; do not treat repair as a silent auto-migrate during production hours. Align Policy output with your written standard: if lint says plaintext provider keys exist, migrate to SecretRef before enabling voice in customer guilds. Doctor in 5.20 also strips stale thinkingFormat compat keys on doctor --fix and warns when sandbox policy hides configured MCP tools—both show up in the same report as Policy lint and confuse triage if you do not read section headers.

03. Seven implementation steps: upgrade → voice → xAI → doctor → evidence

  1. Freeze baseline: Record openclaw --version, Discord voice block in config, xAI auth profiles, and openclaw channels status --json. Note loaded openclaw.json path and Gateway start args.
  2. Backup and upgrade: Snapshot config and workspace profile files (IDENTITY/USER/SOUL). Run openclaw update to v2026.5.20; save stdout, exit code, and npm integrity if you pin tarballs.
  3. Configure Discord voice followUsers: Add snowflake IDs (raw or discord:<id> form). Confirm guild/channel allowlists from the pairing guide. Restart Gateway after voice config changes.
  4. Tune bootstrapContextFiles: Start with defaults; run a short spoken probe. If persona is too heavy for latency, trim to a subset; if you need strict neutrality, set [] and accept generic voice tone.
  5. Authorize xAI on headless hosts: From SSH/VPS session, run device-code login; complete verification on a phone or laptop within the code lifetime. Verify model list and a single completion before routing customer traffic.
  6. Policy plugin and doctor pass: Run openclaw doctor; resolve lint or document waivers. Apply opt-in repair only on disposable hosts first. Re-run after doctor --fix for stale schema keys.
  7. Cross-check channels and archive evidence: Match Discord configured/enabled with voice probe success. Redact logs; delete demo tokens and temporary OAuth profiles on the rental machine.
# Upgrade and baseline
openclaw --version
openclaw channels status --json | head -c 8000 | tee /tmp/oc520-channels-before.json
openclaw update

# Headless xAI OAuth (complete the URL/code on any browser)
openclaw models auth login --provider xai --device-code

# Policy conformance and channel health
openclaw doctor 2>&1 | tee /tmp/oc520-doctor.txt
openclaw channels status --probe --channel discord

Keep at least 15 GB free disk before parallel upgrade, voice provider warmup, and doctor repair—the 5.20 line also bumps bundled Codex harness and Baileys rc12, which expand install artifacts briefly. For SSH bandwidth and rental cost expectations, see the SSH/VNC FAQ. Fresh installs should follow the multiplatform install guide so Node 24 (or ≥22.19) matches release CI.

Example followUsers config fragment

Place voice settings under your Discord channel block. IDs must be numeric snowflakes from Developer Mode, not display names:

{
  "channels": {
    "discord": {
      "voice": {
        "enabled": true,
        "followUsers": ["123456789012345678", "discord:987654321098765432"],
        "followUsersEnabled": true,
        "realtime": {
          "bootstrapContextFiles": ["IDENTITY.md", "USER.md", "SOUL.md"]
        }
      }
    }
  }
}

After editing, restart Gateway and have a followed user join an allowlisted voice channel while you tail logs. If the bot never moves, check followUsersEnabled, allowlist membership, and bot Connect/Speak permissions before touching model routes.

Device-code OAuth workflow on a VPS

SSH into the Gateway host, run device-code login, and keep the session open until authorization completes. Open the printed URL on any trusted device, enter the code, approve scopes, and wait for the CLI to confirm storage. Do not paste codes into chat logs. If auth succeeds but completions fail, inspect auth profile order and whether an old plaintext xAI key in config overrides the new OAuth profile—doctor’s plaintext-secret WARN exists for this release precisely because mixed storage causes silent precedence bugs.

04. Triage table: symptom → first action → wrong move

Symptom First action Wrong move
Bot ignores operator in voice Verify user ID in followUsers; confirm allowlisted channel; check Connect permission Set groupPolicy: open on text to “fix” voice
Voice persona differs from text agent Inspect bootstrapContextFiles and profile MD files in workspace Duplicate entire AGENTS.md into voice config (unsupported)
xAI device-code expires Re-run login with operator ready; check clock skew on VPS Embed long-lived API keys in git-tracked JSON
Doctor floods Policy WARN Triage by category: secrets, allowlists, sandbox/MCP; fix or waive with ticket Disable doctor or delete Policy plugin blindly
Random user speech transcribed in voice Tighten Discord member/channel allowlists; retest with blocked account Assume voice is public because bot joined channel
Bot stuck after DAVE disconnect Retry follow-user move; check 5.20 DAVE recovery notes; restart Gateway if wedged Remove allowlists to force join

When text Discord works but voice fails, split the problem: Gateway registration and OAuth for the Discord app are shared, yet voice adds realtime provider config, follow state machine, and ingress allowlists. Capture openclaw channels status and a short voice log slice before you change three variables at once—teams that simultaneously re-auth xAI, rewrite SOUL.md, and widen allowlists rarely know which fix worked.

05. Datapoints, myths, and 1–3 day rental schedule

  • Datapoint 1: GitHub release v2026.5.20 published 2026-05-21 with documented Discord followUsers, realtime bootstrap profile injection, bundled Policy plugin, and xAI device-code OAuth—use that date as your change audit anchor.
  • Datapoint 2: npm tarball integrity for openclaw@2026.5.20 is published on the release page (sha512-cgshS76CxS3Vp9NGtJR2UGtVZxVR5/4rvok8DKGGL19DugAftNabsXfYajyAEiJ3dC8QTXNqF62MdQNzUnQe8Q==); pin it on rental hosts when reproducing customer incidents.
  • Datapoint 3: Operator samples on M4 rental hardware report a focused 3–5 hour window to validate followUsers, complete device-code OAuth, and clear Policy lint when Discord allowlists were already documented—longer when pairing debt exists from pre-5.12 estates.

Myth A: “Voice followUsers replaces allowlists.” It does not—ingress remains policy-gated. Myth B: “Device-code OAuth removes the need for SecretRef.” Plaintext keys doctor WARNs remain relevant. Myth C: “Policy lint means rollback.” Most findings are conformance debt surfacing, not runtime crashes.

Day 1 (freeze + upgrade + doctor): Morning: capture channels status and auth profiles. Afternoon: upgrade to 5.20, archive doctor output. Evening: configure followUsers for one operator ID in a test guild.

Day 2 (voice persona + xAI): Tune bootstrapContextFiles with spoken probes; run device-code login on the same host shape as production VPS; confirm xAI completion on a routed agent.

Day 3 (Policy repair + handoff): Resolve or waive remaining lint; run allowlisted vs blocked voice probes; wipe rental secrets; publish runbook section listing voice IDs, bootstrap choice, and auth profile order.

Export four artifacts after success: redacted channels status JSON, doctor summary, voice probe transcript snippet, and auth profile list without secrets. Stakeholders care that voice follows the on-call engineer; auditors care that allowlists and Policy lint were addressed deliberately.

06. Headless Linux VPS vs day-rent Mac for voice and auth rehearsal

A Linux VPS is inexpensive for 24/7 Gateway uptime and is the natural home for device-code OAuth—you complete the browser step on a phone while credentials persist on the server. That path breaks down when validation also requires Discord voice state debugging, Control UI review of voice toggles, Keychain-stored deploy keys, and side-by-side reading of profile markdown files while listening to realtime audio. Splitting SSH logs, a phone browser, and a personal laptop Discord client hides wall-clock cost that often exceeds a short Apple Silicon rental.

You can complete much of the CLI checklist on a VPS alone, and production may stay on Linux after acceptance. The rental value is rehearsal fidelity: native macOS lets you run Gateway, Control UI, and Discord desktop in one session, capture followUsers evidence without copying logs across three machines, and rehearse Policy repair on a disposable workspace that mirrors your team’s Mac-centric dev habits. Containers help for spike tests but poorly reproduce voice latency tuning and operator UX for follow-mode handoffs.

When you need a handoff-ready runbook in 1–3 days without buying hardware—and without parking production OAuth codes on a engineer’s personal Discord account—day-rent Mac is usually smoother than chaining VPS SSH, phone OAuth, and ad hoc audio checks. Compare packages on the M4 compute pricing page; connectivity and cost trade-offs live in the SSH/VNC FAQ.