Initializing Enclave...

Fixing 'Compose File Is Invalid' depends_on Long Syntax Error in Docker Stack Deploy

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5 mins


TL;DR

  • What broke: docker stack deploy hard-rejects depends_on long syntax (object form with condition: keys) — this syntax is a Compose v3 feature that Swarm's stack parser never implemented.
  • How to fix it: Downgrade every depends_on block to short syntax (plain list). Move health-gate logic to an entrypoint wait script (wait-for-it.sh, dockerize) or a sidecar.
  • Fast path: Use our Client-Side Sandbox above to auto-refactor this — paste your compose file and get corrected YAML instantly without sending secrets anywhere.

The Incident (What Does the Error Mean?)

You ran:

docker stack deploy -c docker-compose.yml mystack

And got back:

failed to deploy: 1 error(s) decoding:

* services.api.depends_on must be a list

  -- OR --

compose file is invalid because:
services.worker.depends_on contains an invalid type

Immediate consequence: The entire stack deployment is aborted. Zero services start. If this is a rollout pipeline, the deploy job exits non-zero and your CD system may leave the previous broken state in place or trigger an unwanted rollback.

The root cause is a hard parser incompatibility. The long syntax for depends_on — introduced in Compose Specification to express condition: service_healthy — was never backported into the Swarm stack deploy code path. Swarm's YAML parser expects depends_on to be a YAML sequence (list of strings). When it encounters a YAML mapping (object), it throws a type decode error and exits.


The Attack Vector / Blast Radius

This is not a security vulnerability — it is a total availability failure in your deployment pipeline. The blast radius:

  • CI/CD pipelines: Every deploy job that references this compose file will fail. If you gate production releases on docker stack deploy, your release is dead on arrival.
  • Dependency ordering silently lost: When you strip condition: service_healthy to unblock the deploy, you lose the health gate. Services that depended on a healthy DB or cache will start immediately and crash-loop until the dependency is ready. In Swarm, this means restart storms and potential data corruption if your app writes on startup without a ready DB.
  • Silent divergence between local and production: docker compose up (Compose CLI) accepts long syntax fine. Developers test locally with health conditions enforced, then the same file explodes in Swarm. This creates a false confidence gap — your local green test means nothing for the Swarm deploy.
  • Rollout blocking: In a zero-downtime rolling update scenario, a failed stack deploy leaves the stack in its previous state. If the previous state was also broken, you have no recovery path through the standard toolchain.

How to Fix It (The Solution)

Basic Fix — Strip to Short Syntax

Remove the long-form object and replace with a plain YAML list. Swarm will no longer block on the type mismatch.

 services:
   api:
     image: myapp/api:latest
     depends_on:
-      db:
-        condition: service_healthy
-      redis:
-        condition: service_started
+      - db
+      - redis

   db:
     image: postgres:15
     healthcheck:
       test: ["CMD-SHELL", "pg_isready -U postgres"]
       interval: 10s
       timeout: 5s
       retries: 5

Warning: This unblocks the deploy but removes startup ordering guarantees. Your application MUST handle connection retries internally or via an entrypoint wrapper.


Enterprise Best Practice — Entrypoint Wait Script + Swarm Constraints

Do not rely on compose-level orchestration for health gating in Swarm. Enforce readiness at the container entrypoint layer.

 services:
   api:
     image: myapp/api:latest
+    entrypoint: ["./scripts/wait-for-it.sh", "db:5432", "--timeout=60", "--", "./start.sh"]
     depends_on:
-      db:
-        condition: service_healthy
-      redis:
-        condition: service_started
+      - db
+      - redis
     deploy:
       restart_policy:
         condition: on-failure
+        delay: 5s
+        max_attempts: 10
+        window: 30s
+      update_config:
+        parallelism: 1
+        delay: 10s
+        failure_action: rollback
+        order: start-first

   db:
     image: postgres:15
     healthcheck:
       test: ["CMD-SHELL", "pg_isready -U postgres"]
       interval: 10s
       timeout: 5s
       retries: 5
+    deploy:
+      placement:
+        constraints:
+          - node.role == manager

Key points:

  • wait-for-it.sh (or dockerize -wait tcp://db:5432) blocks the app process until the TCP port is open. This is the Swarm-compatible equivalent of condition: service_healthy.
  • restart_policy.max_attempts caps crash-loop damage if the dependency never becomes healthy.
  • update_config.failure_action: rollback auto-reverts a bad rolling update.
  • Keep healthcheck on the dependency service — Swarm uses it for its own internal routing decisions even if compose-level condition gating is unavailable.

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

This class of error should never reach a deploy job. Gate it at lint time.

1. docker compose config (built-in, free)

Add this as the first step in your deploy stage. It validates schema but does NOT catch Swarm-incompatible syntax — use it as a baseline only.

docker compose -f docker-compose.yml config --quiet

2. Checkov — catches Swarm-incompatible depends_on

pip install checkov
checkov -f docker-compose.yml --framework dockerfile

Write a custom Checkov policy if the built-in checks miss your specific pattern:

# checkov/custom/swarm_depends_on.py
from checkov.common.models.enums import CheckResult
from checkov.yaml_doc.base_yaml_check import BaseYamlCheck

class SwarmDependsOnLongSyntax(BaseYamlCheck):
    def __init__(self):
        self.id = "CKV_DOCKER_SWARM_1"
        self.name = "Ensure depends_on uses short syntax for Swarm compatibility"
        super().__init__(self.name, self.id, ["docker-compose.yml"])

    def scan_resource_conf(self, conf):
        for svc in conf.get("services", {}).values():
            dep = svc.get("depends_on", [])
            if isinstance(dep, dict):
                return CheckResult.FAILED
        return CheckResult.PASSED

3. OPA/Conftest policy (shift-left enforcement)

# policy/swarm_depends_on.rego
package docker.swarm

deny[msg] {
  svc := input.services[name]
  is_object(svc.depends_on)
  msg := sprintf(
    "Service '%v': depends_on long syntax is incompatible with docker stack deploy. Use a list.",
    [name]
  )
}

Run in CI:

conftest test docker-compose.yml --policy policy/

4. GitHub Actions gate (drop-in)

- name: Validate Swarm Compose Compatibility
  run: |
    conftest test docker-compose.yml --policy ./policy
    # Fail fast before any docker stack deploy attempt

This stops the depends_on long syntax from ever reaching your Swarm cluster. Fix it in the PR, not in the outage.

Related Diagnostics

"Part of the Syntax Utility Matrix."

View all 153 Syntax Tools →