Fixing KubeVela Application Rollout Failed: Trait Conflict Resolution Guide
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–30 mins
TL;DR
- What broke: A KubeVela
Applicationcomponent has two or more traits that own conflicting control over the same resource axis (replica count, rollout strategy, or scaling), causing the rollout controller to fail reconciliation and surfaceapplication rollout failed: trait conflict. - How to fix it: Remove the conflicting trait (typically a
scalerorhpatrait co-existing with arollouttrait), and consolidate scaling intent into therollouttrait spec. - Fast path: Use our Client-Side Sandbox below to auto-refactor this — paste your Application YAML and get a clean, conflict-free output without sending secrets anywhere.
The Incident (What Does the Error Mean?)
Raw controller log output:
ERROR controller-runtime/controller "Application" failed to reconcile
reason: trait conflict detected on component "web-frontend"
traits [rollout, scaler] both attempt to manage spec.replicas
application rollout failed: trait conflict
The KubeVela ApplicationController performs trait rendering in sequence. When two traits emit patches targeting the same field path — most commonly spec.replicas — the merge strategy produces an irreconcilable conflict. The controller does not pick a winner; it aborts the entire rollout and marks the Application as unhealthy. The previous workload version is left running, but no new rollout proceeds. In canary or blue-green scenarios this means traffic split never advances and the rollout is permanently stalled.
The Attack Vector / Blast Radius
This is a correctness and availability failure, not a security exploit, but the blast radius is significant:
- Stalled rollouts: Any pending image update, config change, or scaling event is blocked. Your CD pipeline shows green (the
Applicationobject was accepted by the API server) but the workload is never updated. - Silent drift: Because the previous ReplicaSet keeps running, operators assume the deploy succeeded. Config drift accumulates undetected until a health check or incident surfaces it.
- HPA + rollout deadlock (most dangerous variant): If a
HorizontalPodAutoscalertrait and arollouttrait both ownspec.replicas, the HPA will continuously fight the rollout controller's replica target during canary weight progression. This causes replica thrashing — pods are created and destroyed in a loop, burning node resources and triggering cascading OOMKills on the node if the cluster is near capacity. - Multi-component blast: If the conflicting component is a shared dependency (e.g., a backend API), every upstream consumer is affected by the stalled rollout.
How to Fix It (The Solution)
Root Cause Identification
Run this to see rendered trait patches before they conflict:
vela traits --app <app-name> -n <namespace>
# Then inspect the Application status:
kubectl get application <app-name> -n <namespace> -o jsonpath='{.status.conditions}' | jq .
Look for reason: TraitConflict in the conditions array.
Basic Fix — Remove the Conflicting Trait
The most common conflict: scaler trait + rollout trait on the same component.
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: web-frontend
spec:
components:
- name: web-frontend
type: webservice
properties:
image: myrepo/web:v2.1.0
port: 8080
traits:
- - type: scaler
- properties:
- replicas: 5
- type: rollout
properties:
targetSize: 5
rolloutBatches:
- replicas: 2
- replicas: 3
Why: The rollout trait owns spec.replicas during progression. The scaler trait's static replica patch collides with it at render time. Remove scaler; set your target replica count inside rollout.properties.targetSize.
Enterprise Best Practice — HPA + Rollout Coexistence
If you need autoscaling AND controlled rollouts, you cannot run hpa and rollout traits simultaneously on the same component. The correct pattern is to disable HPA during rollout using a workflow step, then re-enable it post-rollout.
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: web-frontend
spec:
components:
- name: web-frontend
type: webservice
properties:
image: myrepo/web:v2.1.0
traits:
- - type: hpa
- properties:
- min: 2
- max: 10
- cpuUtil: 60
- type: rollout
properties:
targetSize: 5
rolloutBatches:
- replicas: 2
- replicas: 3
+ # Re-apply HPA via a post-rollout workflow step, not as a concurrent trait
workflow:
steps:
- name: rollout-web
type: canary-deploy
properties:
component: web-frontend
+ - name: restore-hpa
+ type: apply-object
+ properties:
+ value:
+ apiVersion: autoscaling/v2
+ kind: HorizontalPodAutoscaler
+ metadata:
+ name: web-frontend-hpa
+ namespace: production
+ spec:
+ scaleTargetRef:
+ apiVersion: apps/v1
+ kind: Deployment
+ name: web-frontend
+ minReplicas: 2
+ maxReplicas: 10
This pattern sequences the operations: rollout completes → HPA is applied. No concurrent ownership conflict.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. OPA/Gatekeeper Policy — Block Conflicting Trait Combinations
Deploy this ConstraintTemplate to your cluster:
package kubevela.traitconflict
deny[msg] {
component := input.review.object.spec.components[_]
traits := [t.type | t := component.traits[_]]
"rollout" in traits
"scaler" in traits
msg := sprintf("Component '%v' has both 'rollout' and 'scaler' traits. Remove 'scaler' — replica count must be owned exclusively by 'rollout'.", [component.name])
}
deny[msg] {
component := input.review.object.spec.components[_]
traits := [t.type | t := component.traits[_]]
"rollout" in traits
"hpa" in traits
msg := sprintf("Component '%v' has both 'rollout' and 'hpa' traits. HPA must be applied post-rollout via workflow step.", [component.name])
}
2. Pre-commit / PR Gate with vela dry-run
# .github/workflows/kubevela-validate.yml
- name: KubeVela Dry Run
run: |
vela dry-run -f app.yaml --server-side
if [ $? -ne 0 ]; then
echo "Trait conflict detected in dry-run. Block merge."
exit 1
fi
vela dry-run --server-side runs the full trait rendering pipeline against the live API server without committing — it will surface TraitConflict errors at PR time, not at 2 AM.
3. Enforce Trait Ownership Documentation
Maintain a trait ownership matrix in your platform team's runbook. For each field path (spec.replicas, spec.template, etc.), document exactly one trait allowed to own it per component. Treat violations as a P1 review blocker.