Why does docker compose up work fine but docker stack deploy fails with the same file?

The Docker Compose CLI (v2) and the Swarm stack deploy engine are separate parsers with different schema support. The Compose CLI implements the full Compose Specification including depends_on long syntax with condition keys. The Swarm stack deploy parser is older and only accepts depends_on as a YAML sequence (list of strings). The same file is valid for one parser and invalid for the other — this is a known, long-standing divergence that Docker has not reconciled.

If I remove the condition from depends_on, how do I prevent my app from starting before the database is ready?

You must handle readiness at the application or entrypoint layer. The two standard approaches are: (1) Use wait-for-it.sh or dockerize in your container entrypoint to block startup until the dependency's TCP port is accepting connections. (2) Build retry logic directly into your application's database connection initialization with exponential backoff. Swarm's restart_policy with on-failure and a delay acts as a coarse fallback but is not a substitute for proper readiness handling.

Is there any way to use service_healthy condition semantics in Docker Swarm?

Not natively through the compose file. Docker Swarm does respect healthcheck definitions on services and uses health status for its own internal load balancer routing (unhealthy tasks are removed from the VIP). However, it does not expose a mechanism to delay dependent service startup until a dependency reports healthy — that ordering guarantee is simply not implemented in the Swarm orchestrator. The correct pattern is entrypoint-level wait scripts combined with aggressive restart policies.

Fixing 'Compose File Is Invalid' depends_on Long Syntax Error in Docker Stack Deploy

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5 mins

TL;DR

What broke: docker stack deploy hard-rejects depends_on long syntax (object form with condition: keys) — this syntax is a Compose v3 feature that Swarm's stack parser never implemented.
How to fix it: Downgrade every depends_on block to short syntax (plain list). Move health-gate logic to an entrypoint wait script (wait-for-it.sh, dockerize) or a sidecar.
Fast path: Use our Client-Side Sandbox above to auto-refactor this — paste your compose file and get corrected YAML instantly without sending secrets anywhere.

The Incident (What Does the Error Mean?)

You ran:

docker stack deploy -c docker-compose.yml mystack

And got back:

failed to deploy: 1 error(s) decoding:

* services.api.depends_on must be a list

  -- OR --

compose file is invalid because:
services.worker.depends_on contains an invalid type

Immediate consequence: The entire stack deployment is aborted. Zero services start. If this is a rollout pipeline, the deploy job exits non-zero and your CD system may leave the previous broken state in place or trigger an unwanted rollback.

The root cause is a hard parser incompatibility. The long syntax for depends_on — introduced in Compose Specification to express condition: service_healthy — was never backported into the Swarm stack deploy code path. Swarm's YAML parser expects depends_on to be a YAML sequence (list of strings). When it encounters a YAML mapping (object), it throws a type decode error and exits.

The Attack Vector / Blast Radius

This is not a security vulnerability — it is a total availability failure in your deployment pipeline. The blast radius:

CI/CD pipelines: Every deploy job that references this compose file will fail. If you gate production releases on docker stack deploy, your release is dead on arrival.
Dependency ordering silently lost: When you strip condition: service_healthy to unblock the deploy, you lose the health gate. Services that depended on a healthy DB or cache will start immediately and crash-loop until the dependency is ready. In Swarm, this means restart storms and potential data corruption if your app writes on startup without a ready DB.
Silent divergence between local and production: docker compose up (Compose CLI) accepts long syntax fine. Developers test locally with health conditions enforced, then the same file explodes in Swarm. This creates a false confidence gap — your local green test means nothing for the Swarm deploy.
Rollout blocking: In a zero-downtime rolling update scenario, a failed stack deploy leaves the stack in its previous state. If the previous state was also broken, you have no recovery path through the standard toolchain.

How to Fix It (The Solution)

Basic Fix — Strip to Short Syntax

Remove the long-form object and replace with a plain YAML list. Swarm will no longer block on the type mismatch.

 services:
   api:
     image: myapp/api:latest
     depends_on:
-      db:
-        condition: service_healthy
-      redis:
-        condition: service_started
+      - db
+      - redis

   db:
     image: postgres:15
     healthcheck:
       test: ["CMD-SHELL", "pg_isready -U postgres"]
       interval: 10s
       timeout: 5s
       retries: 5

Warning: This unblocks the deploy but removes startup ordering guarantees. Your application MUST handle connection retries internally or via an entrypoint wrapper.

Enterprise Best Practice — Entrypoint Wait Script + Swarm Constraints

Do not rely on compose-level orchestration for health gating in Swarm. Enforce readiness at the container entrypoint layer.

 services:
   api:
     image: myapp/api:latest
+    entrypoint: ["./scripts/wait-for-it.sh", "db:5432", "--timeout=60", "--", "./start.sh"]
     depends_on:
-      db:
-        condition: service_healthy
-      redis:
-        condition: service_started
+      - db
+      - redis
     deploy:
       restart_policy:
         condition: on-failure
+        delay: 5s
+        max_attempts: 10
+        window: 30s
+      update_config:
+        parallelism: 1
+        delay: 10s
+        failure_action: rollback
+        order: start-first

   db:
     image: postgres:15
     healthcheck:
       test: ["CMD-SHELL", "pg_isready -U postgres"]
       interval: 10s
       timeout: 5s
       retries: 5
+    deploy:
+      placement:
+        constraints:
+          - node.role == manager

Key points:

wait-for-it.sh (or dockerize -wait tcp://db:5432) blocks the app process until the TCP port is open. This is the Swarm-compatible equivalent of condition: service_healthy.
restart_policy.max_attempts caps crash-loop damage if the dependency never becomes healthy.
update_config.failure_action: rollback auto-reverts a bad rolling update.
Keep healthcheck on the dependency service — Swarm uses it for its own internal routing decisions even if compose-level condition gating is unavailable.

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

This class of error should never reach a deploy job. Gate it at lint time.

1. docker compose config (built-in, free)

Add this as the first step in your deploy stage. It validates schema but does NOT catch Swarm-incompatible syntax — use it as a baseline only.

docker compose -f docker-compose.yml config --quiet

2. Checkov — catches Swarm-incompatible depends_on

pip install checkov
checkov -f docker-compose.yml --framework dockerfile

Write a custom Checkov policy if the built-in checks miss your specific pattern:

# checkov/custom/swarm_depends_on.py
from checkov.common.models.enums import CheckResult
from checkov.yaml_doc.base_yaml_check import BaseYamlCheck

class SwarmDependsOnLongSyntax(BaseYamlCheck):
    def __init__(self):
        self.id = "CKV_DOCKER_SWARM_1"
        self.name = "Ensure depends_on uses short syntax for Swarm compatibility"
        super().__init__(self.name, self.id, ["docker-compose.yml"])

    def scan_resource_conf(self, conf):
        for svc in conf.get("services", {}).values():
            dep = svc.get("depends_on", [])
            if isinstance(dep, dict):
                return CheckResult.FAILED
        return CheckResult.PASSED

3. OPA/Conftest policy (shift-left enforcement)

# policy/swarm_depends_on.rego
package docker.swarm

deny[msg] {
  svc := input.services[name]
  is_object(svc.depends_on)
  msg := sprintf(
    "Service '%v': depends_on long syntax is incompatible with docker stack deploy. Use a list.",
    [name]
  )
}

Run in CI:

conftest test docker-compose.yml --policy policy/

4. GitHub Actions gate (drop-in)

- name: Validate Swarm Compose Compatibility
  run: |
    conftest test docker-compose.yml --policy ./policy
    # Fail fast before any docker stack deploy attempt

This stops the depends_on long syntax from ever reaching your Swarm cluster. Fix it in the PR, not in the outage.