Fixing 'Compose File Is Invalid' depends_on Long Syntax Error in Docker Stack Deploy
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5 mins
TL;DR
- What broke:
docker stack deployhard-rejectsdepends_onlong syntax (object form withcondition:keys) — this syntax is a Compose v3 feature that Swarm's stack parser never implemented. - How to fix it: Downgrade every
depends_onblock to short syntax (plain list). Move health-gate logic to an entrypoint wait script (wait-for-it.sh,dockerize) or a sidecar. - Fast path: Use our Client-Side Sandbox above to auto-refactor this — paste your compose file and get corrected YAML instantly without sending secrets anywhere.
The Incident (What Does the Error Mean?)
You ran:
docker stack deploy -c docker-compose.yml mystack
And got back:
failed to deploy: 1 error(s) decoding:
* services.api.depends_on must be a list
-- OR --
compose file is invalid because:
services.worker.depends_on contains an invalid type
Immediate consequence: The entire stack deployment is aborted. Zero services start. If this is a rollout pipeline, the deploy job exits non-zero and your CD system may leave the previous broken state in place or trigger an unwanted rollback.
The root cause is a hard parser incompatibility. The long syntax for depends_on — introduced in Compose Specification to express condition: service_healthy — was never backported into the Swarm stack deploy code path. Swarm's YAML parser expects depends_on to be a YAML sequence (list of strings). When it encounters a YAML mapping (object), it throws a type decode error and exits.
The Attack Vector / Blast Radius
This is not a security vulnerability — it is a total availability failure in your deployment pipeline. The blast radius:
- CI/CD pipelines: Every deploy job that references this compose file will fail. If you gate production releases on
docker stack deploy, your release is dead on arrival. - Dependency ordering silently lost: When you strip
condition: service_healthyto unblock the deploy, you lose the health gate. Services that depended on a healthy DB or cache will start immediately and crash-loop until the dependency is ready. In Swarm, this means restart storms and potential data corruption if your app writes on startup without a ready DB. - Silent divergence between local and production:
docker compose up(Compose CLI) accepts long syntax fine. Developers test locally with health conditions enforced, then the same file explodes in Swarm. This creates a false confidence gap — your local green test means nothing for the Swarm deploy. - Rollout blocking: In a zero-downtime rolling update scenario, a failed
stack deployleaves the stack in its previous state. If the previous state was also broken, you have no recovery path through the standard toolchain.
How to Fix It (The Solution)
Basic Fix — Strip to Short Syntax
Remove the long-form object and replace with a plain YAML list. Swarm will no longer block on the type mismatch.
services:
api:
image: myapp/api:latest
depends_on:
- db:
- condition: service_healthy
- redis:
- condition: service_started
+ - db
+ - redis
db:
image: postgres:15
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
Warning: This unblocks the deploy but removes startup ordering guarantees. Your application MUST handle connection retries internally or via an entrypoint wrapper.
Enterprise Best Practice — Entrypoint Wait Script + Swarm Constraints
Do not rely on compose-level orchestration for health gating in Swarm. Enforce readiness at the container entrypoint layer.
services:
api:
image: myapp/api:latest
+ entrypoint: ["./scripts/wait-for-it.sh", "db:5432", "--timeout=60", "--", "./start.sh"]
depends_on:
- db:
- condition: service_healthy
- redis:
- condition: service_started
+ - db
+ - redis
deploy:
restart_policy:
condition: on-failure
+ delay: 5s
+ max_attempts: 10
+ window: 30s
+ update_config:
+ parallelism: 1
+ delay: 10s
+ failure_action: rollback
+ order: start-first
db:
image: postgres:15
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
+ deploy:
+ placement:
+ constraints:
+ - node.role == manager
Key points:
wait-for-it.sh(ordockerize -wait tcp://db:5432) blocks the app process until the TCP port is open. This is the Swarm-compatible equivalent ofcondition: service_healthy.restart_policy.max_attemptscaps crash-loop damage if the dependency never becomes healthy.update_config.failure_action: rollbackauto-reverts a bad rolling update.- Keep
healthcheckon the dependency service — Swarm uses it for its own internal routing decisions even if compose-level condition gating is unavailable.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
This class of error should never reach a deploy job. Gate it at lint time.
1. docker compose config (built-in, free)
Add this as the first step in your deploy stage. It validates schema but does NOT catch Swarm-incompatible syntax — use it as a baseline only.
docker compose -f docker-compose.yml config --quiet
2. Checkov — catches Swarm-incompatible depends_on
pip install checkov
checkov -f docker-compose.yml --framework dockerfile
Write a custom Checkov policy if the built-in checks miss your specific pattern:
# checkov/custom/swarm_depends_on.py
from checkov.common.models.enums import CheckResult
from checkov.yaml_doc.base_yaml_check import BaseYamlCheck
class SwarmDependsOnLongSyntax(BaseYamlCheck):
def __init__(self):
self.id = "CKV_DOCKER_SWARM_1"
self.name = "Ensure depends_on uses short syntax for Swarm compatibility"
super().__init__(self.name, self.id, ["docker-compose.yml"])
def scan_resource_conf(self, conf):
for svc in conf.get("services", {}).values():
dep = svc.get("depends_on", [])
if isinstance(dep, dict):
return CheckResult.FAILED
return CheckResult.PASSED
3. OPA/Conftest policy (shift-left enforcement)
# policy/swarm_depends_on.rego
package docker.swarm
deny[msg] {
svc := input.services[name]
is_object(svc.depends_on)
msg := sprintf(
"Service '%v': depends_on long syntax is incompatible with docker stack deploy. Use a list.",
[name]
)
}
Run in CI:
conftest test docker-compose.yml --policy policy/
4. GitHub Actions gate (drop-in)
- name: Validate Swarm Compose Compatibility
run: |
conftest test docker-compose.yml --policy ./policy
# Fail fast before any docker stack deploy attempt
This stops the depends_on long syntax from ever reaching your Swarm cluster. Fix it in the PR, not in the outage.