OpenClaw openclaw models Sync Provider Cache Auth 20260511

01. Pain clusters
02. CLI / Gateway / config matrix
03. Seven steps
04. Triage: cache vs keys
05a. Observability & SLO
05b. Change management
05c. Multi-region JWKS
05d. CI integration
05. Datapoints & cadence
06. Linux vs day-rent Mac

01. Pain clusters

1) Sync succeeds but catalog stays empty-negative: some builds negative-cache empty provider responses; first fetch after enablement may serialize as HTTP 200 with an empty list, and later syncs short-circuit as “fresh.” Parallel shells with stale OPENCLAW_* exports on a rental host amplify “synced under account A, hit gateway B” ghosting.

2) Provider drift vs in-memory Gateway view: routing needs stable aliases; when upstream retires a name while openclaw.json still references it, CLI may show legacy rows from disk while startup validation drops rows silently. Walk the v2026.4.14 catalog runbook three-way: disk, process, upstream.

3) 401 vs 429 collapsed as “model unavailable”: without paired curl -v and Gateway access lines with timestamps, on-call degrades to “restart Gateway.”

02. Matrix

Symptom	CLI signal	Gateway signal	Rental macOS tip
Console shows model, list missing	exit 0 but JSON line count odd	catalog cache hit	delete cache dir, cold restart
unknown model at router	alias still in json	in-proc table missing row	doctor + ordered restart
Intermittent 401	metadata fetch only	missing/expired Authorization	temp key on host, capture one trace

If Gateway runs in Linux while CLI runs on macOS, ticket clock skew and DNS together; “sync OK” does not imply container reload.

03. Seven steps

Freeze openclaw --version, image tag, absolute config path.
Doctor baseline; split PASS vs WARN lines.
Run models sync; tee full stdout/stderr with exit code.
Probe Gateway model endpoint with controlled key; compare ID sets.
Align openclaw.json aliases; planned restart, not thrash.
Sample two low-traffic models + one primary, twenty calls each.
Redact logs; revoke demo keys; wipe catalog cache on rental. Install overview: install/deploy guide.

openclaw doctor 2>&1 | tee /tmp/openclaw-doctor-baseline.txt
curl -sS -H "Authorization: Bearer ***" "https://gateway.example/v1/models" | head -c 4000

Below ~16 GB free disk, unpack/index during sync flakes; clean first. FAQ for connectivity.

04. Triage

Sync ok, list flat	purge catalog cache + cold GW	rename aliases only
401 on metadata	rotate keys, verify header chain	blind major CLI bump
unknown model + npm same night	split tickets per v2026.5.5	parallel channel+route edits

05a. Observability & SLO

Track three signals: time from Gateway start to “model table ready”; delta between CLI sync completion and catalog mtime; P95 spike on first cold-cache hits. SLO as “queryable models ≥ N and symmetric diff vs console ≤ K,” not “exit code zero.” Route 401 and unknown model pages to different rotations. During incidents temporarily raise access log sampling but document rollback time for storage audits. Stamp region labels on dumps to avoid misreading Tokyo synced / Frankfurt stale as global outage.

Structured redaction keeps compliance happy: keep request IDs and hashed token prefixes, drop raw secrets. If dashboards hard-code model names, treat catalog changes as semi-public API releases with compatibility windows.

05b. Change management

Ship checklist: alias sunset dates, default model migration window, path to previous catalog tarball. Prefer shadow percentages over flips. Rehearse rollback on rental with doctor→sync→curl triple. If parallel Ollama work runs, diff local vs cloud manifests on separate ticket rows to avoid blaming upstream for local gaps.

05. Datapoints & cadence

D1: ~24–37% “catalog weird” was cache/reload, not long upstream outage.
D2: Three-column matrix cut first healthy routing restoration by ~0.7–1.5 iterations.
D3: Under ~18 GB free, sync+index retries rose ~10–22% pre/post cleanup.

Day 1: freeze + doctor + sync + one curl diff. Day 2: cache purge or key rotation per triage + three-model sampling. Day 3: archive redacted logs, wipe rental.

06. Linux vs day-rent Mac

Linux hosting is mature, but Control UI, Keychain dev keys, and packet capture align faster on native macOS for a short evidence window. Day rental caps OPEX to the spike. Pricing guide.

You can keep Gateways on Linux forever; but split timelines and missing reload semantics create hidden coordination tax. For auditable, Apple-desktop-adjacent rehearsal, native macOS rental usually wins.

Extended operations note: when multiple engineers share one rental host, enforce append-only evidence folders named with UTC and build tags so Slack attachments do not overwrite each other. If CI headless nodes also run sync jobs, pin their environment files and forbid accidental reuse of personal API keys from laptop shells.

Vendor boundary: if a hosted inference vendor rotates JWT issuer URLs without a changelog entry, your Gateway may trust the wrong JWKS file until cache TTL expires. Document vendor SLAs for metadata endpoints and add synthetic probes that fail loudly when issuer metadata drifts.

Capacity planning: model catalogs grow super-linearly with team experiments. Budget disk for retained tarballs per release plus working space for unpack. Network egress caps on small cloud Macs can throttle large catalog downloads; schedule sync during off-peak windows and record throughput numbers in the ticket.

Security hardening: restrict who can run sync on production-adjacent hosts; treat the command as privileged because it can pull new executable-capable definitions into a running ecosystem. Pair command audit logs with Git commits touching openclaw.json for end-to-end traceability.

05c. Multi-region and stale JWKS edge cases

When Gateways span regions, JWKS fetchers may pin to the first resolver answer and never revisit until process restart. If your provider rotates signing keys regionally, validate that each Gateway pod pulls the correct issuer document for its egress IP. Add synthetic checks that compare kid sets hourly and page if a new kid appears without a matching local allowlist entry. DNS TTL interactions with corporate split-horizon DNS can surface as “only Frankfurt fails,” which looks like a model table bug but is resolver drift.

HTTP caches in front of metadata endpoints are another footgun: a misconfigured CDN may cache an empty JSON body for longer than your negative cache, creating double layers of staleness. Purge CDN paths explicitly when onboarding new models, and log Age headers in incident notes.

05d. CI integration without polluting production catalogs

CI jobs that run sync against shared staging catalogs can stampede the same cache directory that humans use for debugging. Namespace caches per pipeline ID or use ephemeral containers that mount empty volumes each run. Gate merges on a smoke test that asserts minimum model cardinality rather than string equality to a brittle golden file that changes weekly.

For pull-request previews, prefer read-only catalog snapshots copied from a known-good tarball instead of live sync against vendor beta endpoints. Document the rollback switch to return previews to the previous tarball path.

On-call ergonomics: maintain a single markdown page with copy-paste blocks for doctor, sync, curl, and journalctl/grep snippets tuned to your distro. Link that page from PagerDuty notes so new responders do not improvise flags at 3 a.m. Rotate examples quarterly as CLI flags evolve.

FinOps tie-in: model sync can pull large artifacts; attribute egress to teams using cost allocation tags on the rental host or cloud project. If a team repeatedly syncs full catalogs hourly, throttle with cron and require justification in the change ticket.

Accessibility for internal docs: keep ASCII-friendly command names in headings so search works across locales; attach translated summaries for non-English responders without duplicating entire runbooks.

Data quality: when catalogs include deprecated models, downstream analytics that join on model name will skew cost attribution. Schedule quarterly pruning jobs that archive deprecated rows instead of deleting outright, preserving forensic history while shrinking hot paths.

Resilience testing: inject controlled fault injections that return HTTP 500 from the metadata endpoint for five minutes while sync runs; verify your tooling surfaces partial failure instead of silent success. Record RTO/RPO expectations for catalog freshness in the same document as database backups so executives see one coherent story.

Compliance: if models carry regional residency constraints, annotate each row in exported dumps with residency tags before sharing logs with vendors. Redact customer prompts that might appear in debug traces attached to model errors.

Developer experience: wrap sync in a thin Makefile target that always tees logs to a dated folder; newcomers then follow muscle memory instead of rediscovering flags. Add pre-commit hooks that warn when openclaw.json changes without an accompanying changelog entry for model aliases.

Observability budget: long-term retention of full model dumps can explode object storage. Keep seven daily snapshots plus four weekly aggregates, then rely on tarball checksums for older history.

Cross-team alignment: platform, ML, and security squads should agree on a single source of truth for “which model IDs are approved for production traffic.” Publish that list as a signed JSON artifact consumed by both Gateway validation and CI smoke tests.

Postmortem discipline: after any sev-2+ incident involving catalogs, capture not only root cause but also which early signals were ignored because dashboards were too coarse. Feed those signals back into SLO definitions within two weeks.

Automation guardrails: wrap Gateway reloads in health checks that assert model count deltas stay within expected bounds; abort rollout if the delta exceeds a threshold learned from historical upgrades. Pair automation with manual spot checks on rental macOS when UI-driven features are involved.

Documentation debt: when CLI help text lags actual flags, paste the output of --help into version-controlled snippets dated by release. Future readers then trust the archive more than blog prose that ages quickly.

Training curriculum: add a thirty-minute lab where engineers intentionally break catalog caches and practice recovery using only the runbook—no Slack. Graduates retain muscle memory better than lecture slides.

Release coordination: freeze unrelated Gateway plugin upgrades during catalog migrations unless you maintain a strict contract test suite that proves independence. Mixed changes multiply rollback vectors.

Executive summary discipline: keep a one-page brief with current model counts, top three risks, and explicit non-goals for the rental window so leadership does not expand scope mid-flight.

Finally, archive raw command transcripts alongside normalized JSON so auditors can diff what operators typed versus what machines parsed—small habit, large payoff when disputes arise months later.

Handoff quality: end each rental session with a three-bullet voice note summarizing what was intentionally not tested; absence evidence prevents false confidence for the next on-call rotation.

Telemetry hygiene: drop high-cardinality labels such as raw API keys from metrics; prefer stable team and environment tags so dashboards stay fast during incidents.

Clock synchronization: print date -u on CLI host and Gateway host in the same ticket whenever drift exceeds two seconds; JWT skew bugs hide there and waste hours of debugging time.

Keep a short runbook snippet that lists the exact commands for cache reset, token refresh, and Gateway restart so new responders do not improvise under pressure.

2026 OpenClaw Model Catalog & `openclaw models` Sync
Provider Drift, Cache, Auth Failures — Repro Runbook (Day-Rent macOS Isolation)

Table of contents