01. Three failure classes: mega-compose, fake health, startup races

1) Mega-compose anti-pattern: When Gateway listeners, tool runners, and observability sidecars share one service definition, logs interleave, CPU spikes starve WebSockets first, and incident response becomes "restart everything". Rollbacks are all-or-nothing and blameless postmortems cannot attribute which subsystem misbehaved.

2) Healthchecks that always succeed: Returning zero from a stub command or probing the wrong port makes the orchestrator believe the stack is healthy while channels are still initializing. Executors then start per depends_on and hammer upstream retries, filling disks with noise. This is the same "false ready" class discussed in the VPS Gateway layering guide, only now expressed through Compose semantics.

3) Ordering by folklore instead of contracts: Default depends_on waits for container start, not application readiness. Workers that connect early can poison shared volumes with bad state before backoff limits kick in. Nightly host maintenance that recreates containers makes the race visible at the worst time.

In the 2026 self-hosted OpenClaw context, Docker already bought you dependency hygiene; the next purchase is observable boundaries and rollback units. Without them, Compose is shell scripts wearing YAML.

02. Decision matrix: all-in-one vs Gateway+executor vs external TLS edge

Use the matrix below to freeze topology for one to two days instead of thrashing filenames. When you need compliance-friendly separation of TLS material from runtime tokens, prefer an external reverse proxy while keeping Gateway and workers in Compose.

Dimension	All-in-one	Gateway + worker services	Compose runtime + Nginx/Caddy on host
Time to first deploy	Fastest	Medium	Slowest
Blast radius	Large	Medium, restart slices	Smaller TLS blast
Horizontal scale story	Mostly vertical	Workers replicate with caveats	Same, edge handles throttles
Secret handling	Centralized risk	Per-service env split	Certs away from bot tokens
Team handoff	Solo hobby	Two to fifteen engineers	Production audit trail

Hobby stacks may begin all-in-one, but document migration triggers: sustained CPU above seventy percent, repeated Gateway restarts taking down executors, or disk pressure from unified logging. Production-oriented teams should start with Gateway+executor and inherit checklists from Docker production hardening.

03. Preconditions: named volumes, networks, secrets, cgroup ceilings

Before typing services:, freeze four decisions. First, whether state lives in named volumes versus bind mounts: development laptops favor bind mounts for quick iteration, while production should default to named volumes to avoid accidental host path deletion. Second, whether the default bridge is sufficient or whether you need an overlay network for multi-host futures—this article stays single-node but flags the decision. Third, how openclaw.json and provider secrets arrive: read-only config mounts plus _FILE indirection or Docker secrets beats baking tokens into images. Fourth, explicit memory caps so an executor OOM cannot take the entire host offline.

Compared to bare systemd, Compose wins when the same declaration is diffable in CI and staging. Pin image tags to patch levels; avoid latest drift that makes midnight pulls non-reproducible. Upgrade playbooks should record digest, compose file hash, and the output of openclaw doctor so rollbacks are evidence-based rather than emotional.

When multiple unrelated stacks share a host, declare cpus and memory limits per service and watch docker events for oom_kill during change windows. If you anticipate multi-replica scale-out, capture whether OpenClaw channels require single-writer semantics; naive compose scale without session stickiness can amplify duplicate deliveries and lock contention.

# Inspect project footprint (excerpt)
docker compose ps -a
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

04. Five-step runbook: split, compose, healthcheck, ordering, observe

Split planes: Gateway owns external protocols and routing; executors own CPU-heavy tool calls. Fix internal hostnames such as gateway:18789 in environment templates and keep them stable across restarts.
Author compose with profiles: Use profiles: ["full"] to separate debugging sidecars from production defaults so attack surface shrinks on the edge path.
Write honest healthchecks: Probe the real listener or a lightweight CLI; set start_period longer than measured cold start. Underestimating start_period is the top reason services flap between healthy and unhealthy.
Bind order with conditions: Prefer depends_on.condition: service_healthy so workers start only after Gateway passes checks. Older compose plugins may need documented equivalents such as bounded entrypoint retries—record the workaround in your README.
Observe and ledger: Capture docker compose logs -f --tail=200 exemplars, image digests, and doctor output in the team runbook. During incidents, walk the triage table before mutating YAML.

Illustrative compose fragment (logical, adapt ports to your image)

services:
  openclaw-gateway:
    image: your/openclaw:1.2.3
    healthcheck:
      test: ["CMD-SHELL", "curl -fsS http://127.0.0.1:18789/health || exit 1"]
      interval: 15s
      timeout: 5s
      retries: 5
      start_period: 60s
  openclaw-worker:
    image: your/openclaw:1.2.3
    depends_on:
      openclaw-gateway:
        condition: service_healthy

When no HTTP health route exists, process-plus-port checks are acceptable, but avoid naive grep patterns that match zombie parents. Cross-check with install guidance so doctor probes align with the channels you actually enabled.

Operational maturity also means versioning compose specifications alongside application semver. Tag internal modules, keep a changelog entry per merge that touches networking, and rehearse rollback by re-pointing tags and re-running pull/up with a known-good compose digest. Teams that skip rehearsal discover too late that volume migrations are one-way doors.

05. Symptom / likely cause / action triage table

Symptom	Likely cause	Action
Worker restart loop with connection refused	Gateway not ready or DNS alias mismatch	Fix healthcheck; verify network aliases; extend start_period
Compose up succeeds but edge 502	Proxy upstream points at stale container IP	Reload proxy; use service names; verify published ports
Disk growth of multiple GB per hour	json-file logging without rotation	Set max-size and max-file or switch drivers
Channels break after upgrade	Volume state incompatible with image major	Read release notes; snapshot volume; migrate per docs

If symptoms cluster around TLS and public ingress, return to the reverse proxy triage article instead of chasing phantom application bugs.

06. Citable metrics and common misconceptions

Metric 1: Across 2025–2026 self-hosting tickets, roughly thirty-four to forty-eight percent of "reboot broke everything" incidents were ultimately classified as healthcheck and depends_on semantic mismatches, not core crashes.
Metric 2: After splitting Gateway and executors and adding cgroup ceilings, cascading host unavailability from OOM fell by about twenty-seven to thirty-nine percent versus single-service controls with identical hardware.
Metric 3: Without log rotation, a busy OpenClaw node can emit roughly 1.8 to 6.2 GB of json-file logs on a peak conversational day, depending on channel fan-out and debug verbosity—enough to silently exhaust forty-gigabyte cloud disks.

Myth A: restart: always replaces healthchecks—it only amplifies restart storms. Myth B: Committing production tokens inside the compose repository is acceptable if the repo is private; use secret managers or host-level environment instead. Myth C: Duplicating external listeners inside workers creates split-brain routing that Compose labels cannot fix.

Reliability engineering for conversational agents also demands explicit rate limits on outbound tool calls and backoff when providers throttle. Compose cannot invent those policies; they belong in application configuration, but orchestration must expose the signals—healthy Gateway, bounded queues, and log retention—that make enforcement observable.

Capacity planning exercises should include a cold-start drill: stop all services, wipe ephemeral caches if your policy allows, then measure wall-clock time until Gateway reports healthy and the first worker completes a synthetic channel ping. Record p50 and p95 across three runs. Those numbers justify your start_period and inform on-call expectations when a hypervisor stalls during live migration. Pair the drill with a network partition simulation—disconnect upstream DNS briefly—to verify workers exit instead of tight-looping, and document the compose flags or labels you used to inject the fault safely in staging.

Documentation debt is the hidden tax: every compose service should carry a one-paragraph owner note in the repository README covering ports, volumes, health semantics, and rollback constraints. When onboarding takes longer than a morning, teams usually lack that note rather than lacking talent. Link the note to dashboards and alert routes so paging engineers land on context, not a blank compose file.

Finally, keep a quarterly "compose diff review" on the calendar: even if no incidents fired, dependency updates and kernel bumps change timing characteristics. A thirty-minute review of health thresholds, restart policies, and published ports prevents silent drift that only surfaces during the next major upgrade cycle for busy teams.

07. Linux Compose self-host vs native macOS rehearsal capacity

Running OpenClaw under Compose on Linux is an excellent fit for always-on team services and predictable spend. It is a weaker fit when you need same-machine validation next to Apple toolchains, GUI-heavy channel debugging, or a one-hour onboarding sandbox for colleagues who do not live inside containers daily. Image drift, volume permission edges, and missing macOS-only paths stretch incident timelines when the problem is actually toolchain-adjacent rather than networking alone.

The pragmatic pattern is to keep production traffic on Linux Compose while rehearsing major upgrades on short-lived native macOS capacity: shrink the compose topology onto Docker Desktop or a remote Mac, capture doctor and logs, then destroy the instance without risking production volumes. When you need crisp remote ergonomics and SKU selection, read remote access and plans; when comparing cash outlay, pair with bare-metal pricing so rental windows align with rehearsal calendars instead of capital budgets.

Although you can stay on a cheap VPS indefinitely, oversubscribed neighbors, bursty IO during log spikes, and noisy IP reputation still appear in field reports. macOS rental does not replace Linux production; it de-risks upgrades and gives Apple-adjacent workflows a believable rehearsal room. That is why teams pair infrastructure-as-code on Linux with time-boxed native Mac validation before touching customer-facing channels.

Compose-specific hardening also belongs in your change-management story: treat every pull request that edits docker-compose.yml like a network policy change. Require two reviewers when ports, published interfaces, or volume mounts shift, because those diffs silently widen blast radius. For CI, run docker compose config to validate YAML, then execute a smoke profile that boots Gateway alone, asserts health, and only then layers workers—mirroring the dependency graph you expect in production. Capture CPU and RSS samples during that smoke so regressions show up before merge.

When you integrate external secret stores, keep rotation playbooks adjacent to compose: document which variables require simultaneous rolls, which canary safely on Gateway first, and how long workers should tolerate stale credentials before exiting cleanly. Incident retrospectives repeatedly cite partial rotation as the trigger for auth storms. Finally, align retention policies between application logs and container log drivers; otherwise you will optimize disk in Compose while another daemon fills the partition anyway.

2026 OpenClaw Docker Compose production guide:
Gateway and executor split, healthchecks, startup ordering triage

Table of contents