Initializing Enclave...

Fixing KubeVela Application Rollout Failed: Trait Conflict Resolution Guide

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–30 mins


TL;DR

  • What broke: A KubeVela Application component has two or more traits that own conflicting control over the same resource axis (replica count, rollout strategy, or scaling), causing the rollout controller to fail reconciliation and surface application rollout failed: trait conflict.
  • How to fix it: Remove the conflicting trait (typically a scaler or hpa trait co-existing with a rollout trait), and consolidate scaling intent into the rollout trait spec.
  • Fast path: Use our Client-Side Sandbox below to auto-refactor this — paste your Application YAML and get a clean, conflict-free output without sending secrets anywhere.

The Incident (What Does the Error Mean?)

Raw controller log output:

ERROR controller-runtime/controller "Application" failed to reconcile
reason: trait conflict detected on component "web-frontend"
traits [rollout, scaler] both attempt to manage spec.replicas
application rollout failed: trait conflict

The KubeVela ApplicationController performs trait rendering in sequence. When two traits emit patches targeting the same field path — most commonly spec.replicas — the merge strategy produces an irreconcilable conflict. The controller does not pick a winner; it aborts the entire rollout and marks the Application as unhealthy. The previous workload version is left running, but no new rollout proceeds. In canary or blue-green scenarios this means traffic split never advances and the rollout is permanently stalled.


The Attack Vector / Blast Radius

This is a correctness and availability failure, not a security exploit, but the blast radius is significant:

  • Stalled rollouts: Any pending image update, config change, or scaling event is blocked. Your CD pipeline shows green (the Application object was accepted by the API server) but the workload is never updated.
  • Silent drift: Because the previous ReplicaSet keeps running, operators assume the deploy succeeded. Config drift accumulates undetected until a health check or incident surfaces it.
  • HPA + rollout deadlock (most dangerous variant): If a HorizontalPodAutoscaler trait and a rollout trait both own spec.replicas, the HPA will continuously fight the rollout controller's replica target during canary weight progression. This causes replica thrashing — pods are created and destroyed in a loop, burning node resources and triggering cascading OOMKills on the node if the cluster is near capacity.
  • Multi-component blast: If the conflicting component is a shared dependency (e.g., a backend API), every upstream consumer is affected by the stalled rollout.

How to Fix It (The Solution)

Root Cause Identification

Run this to see rendered trait patches before they conflict:

vela traits --app <app-name> -n <namespace>
# Then inspect the Application status:
kubectl get application <app-name> -n <namespace> -o jsonpath='{.status.conditions}' | jq .

Look for reason: TraitConflict in the conditions array.


Basic Fix — Remove the Conflicting Trait

The most common conflict: scaler trait + rollout trait on the same component.

apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
  name: web-frontend
spec:
  components:
    - name: web-frontend
      type: webservice
      properties:
        image: myrepo/web:v2.1.0
        port: 8080
      traits:
-       - type: scaler
-         properties:
-           replicas: 5
        - type: rollout
          properties:
            targetSize: 5
            rolloutBatches:
              - replicas: 2
              - replicas: 3

Why: The rollout trait owns spec.replicas during progression. The scaler trait's static replica patch collides with it at render time. Remove scaler; set your target replica count inside rollout.properties.targetSize.


Enterprise Best Practice — HPA + Rollout Coexistence

If you need autoscaling AND controlled rollouts, you cannot run hpa and rollout traits simultaneously on the same component. The correct pattern is to disable HPA during rollout using a workflow step, then re-enable it post-rollout.

apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
  name: web-frontend
spec:
  components:
    - name: web-frontend
      type: webservice
      properties:
        image: myrepo/web:v2.1.0
      traits:
-       - type: hpa
-         properties:
-           min: 2
-           max: 10
-           cpuUtil: 60
        - type: rollout
          properties:
            targetSize: 5
            rolloutBatches:
              - replicas: 2
              - replicas: 3
+ # Re-apply HPA via a post-rollout workflow step, not as a concurrent trait
  workflow:
    steps:
      - name: rollout-web
        type: canary-deploy
        properties:
          component: web-frontend
+     - name: restore-hpa
+       type: apply-object
+       properties:
+         value:
+           apiVersion: autoscaling/v2
+           kind: HorizontalPodAutoscaler
+           metadata:
+             name: web-frontend-hpa
+             namespace: production
+           spec:
+             scaleTargetRef:
+               apiVersion: apps/v1
+               kind: Deployment
+               name: web-frontend
+             minReplicas: 2
+             maxReplicas: 10

This pattern sequences the operations: rollout completes → HPA is applied. No concurrent ownership conflict.


💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. OPA/Gatekeeper Policy — Block Conflicting Trait Combinations

Deploy this ConstraintTemplate to your cluster:

package kubevela.traitconflict

deny[msg] {
  component := input.review.object.spec.components[_]
  traits := [t.type | t := component.traits[_]]
  "rollout" in traits
  "scaler" in traits
  msg := sprintf("Component '%v' has both 'rollout' and 'scaler' traits. Remove 'scaler' — replica count must be owned exclusively by 'rollout'.", [component.name])
}

deny[msg] {
  component := input.review.object.spec.components[_]
  traits := [t.type | t := component.traits[_]]
  "rollout" in traits
  "hpa" in traits
  msg := sprintf("Component '%v' has both 'rollout' and 'hpa' traits. HPA must be applied post-rollout via workflow step.", [component.name])
}

2. Pre-commit / PR Gate with vela dry-run

# .github/workflows/kubevela-validate.yml
- name: KubeVela Dry Run
  run: |
    vela dry-run -f app.yaml --server-side
    if [ $? -ne 0 ]; then
      echo "Trait conflict detected in dry-run. Block merge."
      exit 1
    fi

vela dry-run --server-side runs the full trait rendering pipeline against the live API server without committing — it will surface TraitConflict errors at PR time, not at 2 AM.

3. Enforce Trait Ownership Documentation

Maintain a trait ownership matrix in your platform team's runbook. For each field path (spec.replicas, spec.template, etc.), document exactly one trait allowed to own it per component. Treat violations as a P1 review blocker.

Related Diagnostics

"Part of the Syntax Utility Matrix."

View all 153 Syntax Tools →