How to Fix Kubernetes 'Volume Mount Failed: Read-Only File System' Error
Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 10 mins
TL;DR
- What broke: A container attempted a write to a volume (or its own root filesystem) that the kubelet mounted as read-only — either via an explicit
readOnly: trueflag,securityContext.readOnlyRootFilesystem: true, aReadOnlyManyPVC access mode mismatch, or a node-level filesystem that remounted itselfroafter an I/O error. - How to fix it: Audit the
volumeMounts[].readOnlyflag, the pod-levelsecurityContext, and the PVCaccessModes. If the node filesystem wentro, you have a hardware/disk corruption event — cordon the node immediately. - Shortcut: Use our Client-Side Sandbox below to auto-refactor your failing Pod spec without leaking it to a third-party AI.
The Incident (What Does the Error Mean?)
Raw error surface — you will see one or more of these:
# kubelet event
Warning Failed 5s kubelet Error: failed to start container "app":
Error response from daemon: OCI runtime create failed:
container_linux.go:380: starting container process caused:
process_linux.go:545: container init caused:
rootfs_linux.go:76: mounting "/var/lib/kubelet/pods/.../volumes/..."
to rootfs at "/data" caused:
mount through procfd:
mount /var/lib/kubelet/pods/.../volumes/...:/data (via /proc/self/fd/6),
flags: 0x5001: read-only file system: unknown
# or inside the container at runtime
OSError: [Errno 30] Read-only file system: '/data/output.log'
# or dmesg on the node
[1234567.890] EXT4-fs error (device nvme0n1p1): ...
[1234567.891] EXT4-fs (nvme0n1p1): Remounting filesystem read-only
Immediate consequence: The container either never starts (CrashLoopBackOff) or starts and immediately throws EROFS on the first write, killing the workload process. Persistent queues, log pipelines, and stateful apps are fully blocked.
The Attack Vector / Blast Radius
There are four distinct root causes — misidentifying them wastes hours:
| Root Cause | Blast Radius |
|---|---|
volumeMounts[].readOnly: true set by mistake |
Single container, single volume. Fast fix. |
securityContext.readOnlyRootFilesystem: true |
Entire container rootfs is read-only. Any write (temp files, PID files, log rotation) fails. |
PVC accessModes: [ReadOnlyMany] when ReadWriteOnce is needed |
All pods mounting that PVC are blocked from writing. |
| Node-level ext4/xfs remount-ro (disk I/O error) | Node is dying. All pods on that node are affected. Data loss risk is real. This is your 3am page. |
The node-level scenario is the dangerous one. A degraded NVMe or EBS volume triggers the kernel to remount the filesystem read-only to prevent corruption. The kubelet itself cannot write state. Cordon and drain the node before anything else.
How to Fix It (The Solution)
Diagnosis Checklist — Run These First
# 1. Get the exact event
kubectl describe pod <pod-name> -n <namespace> | grep -A 20 "Events:"
# 2. Check the node the pod landed on
kubectl get pod <pod-name> -n <namespace> -o wide
# 3. Check node conditions — look for DiskPressure or custom taints
kubectl describe node <node-name> | grep -A 10 "Conditions:"
# 4. SSH to node and check dmesg for remount-ro events
sudo dmesg | grep -i "read-only\|remounting\|EXT4-fs error\|XFS.*error"
# 5. Verify actual mount flags on the node
cat /proc/mounts | grep <volume-path>
Fix 1 — Remove Erroneous readOnly: true on volumeMount
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: myapp:1.0
volumeMounts:
- name: data-vol
mountPath: /data
- readOnly: true
+ readOnly: false # or remove the line entirely; default is false
volumes:
- name: data-vol
persistentVolumeClaim:
claimName: app-pvc
Fix 2 — readOnlyRootFilesystem: true with Writable EmptyDir Overlays (Enterprise Best Practice)
readOnlyRootFilesystem: true is a correct security hardening control — do not simply disable it. Instead, mount emptyDir volumes over the specific paths your app needs to write.
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
containers:
- name: app
image: myapp:1.0
securityContext:
- readOnlyRootFilesystem: false # DO NOT do this — removes hardening
+ readOnlyRootFilesystem: true # Keep this. Add targeted writable mounts instead.
volumeMounts:
- name: data-pvc
mountPath: /data
+ - name: tmp-dir
+ mountPath: /tmp # app writes temp files here
+ - name: run-dir
+ mountPath: /var/run # PID files, sockets
+ - name: log-dir
+ mountPath: /var/log/app # log rotation target
volumes:
- name: data-pvc
persistentVolumeClaim:
claimName: app-pvc
+ - name: tmp-dir
+ emptyDir: {}
+ - name: run-dir
+ emptyDir: {}
+ - name: log-dir
+ emptyDir:
+ sizeLimit: 500Mi
Fix 3 — PVC Access Mode Mismatch
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-pvc
spec:
accessModes:
- - ReadOnlyMany # Wrong if the app needs to write
+ - ReadWriteOnce # For single-node write workloads
# Use ReadWriteMany only if your CSI driver supports it (EFS, NFS, Longhorn)
resources:
requests:
storage: 10Gi
storageClassName: gp3
Fix 4 — Node-Level Remount-ro (Emergency Response)
# IMMEDIATE: Cordon the node — stop scheduling new pods
kubectl cordon <node-name>
# Drain with grace period — evict existing pods
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --grace-period=60
# On the node itself — attempt remount rw ONLY if you have confirmed no corruption
# and this is a transient I/O blip (e.g., EBS multi-attach timeout)
sudo mount -o remount,rw /
# Verify
cat /proc/mounts | grep " / " | grep rw
# If EBS: detach and reattach the volume from the AWS console, then run fsck
sudo fsck.ext4 -y /dev/nvme0n1p1
# If corruption is confirmed: replace the node, do NOT remount rw
# Terminate the instance and let the ASG/node group provision a clean replacement
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. OPA/Gatekeeper Policy — Enforce emptyDir overlays when readOnlyRootFilesystem is true
# opa/policies/readonly-rootfs-requires-tmp-mount.rego
package kubernetes.admission
deny[msg] {
container := input.review.object.spec.containers[_]
container.securityContext.readOnlyRootFilesystem == true
not has_emptydir_for_tmp(input.review.object.spec, container)
msg := sprintf("Container '%v' has readOnlyRootFilesystem but no emptyDir for /tmp. App will crash on first write.", [container.name])
}
has_emptydir_for_tmp(spec, container) {
mount := container.volumeMounts[_]
mount.mountPath == "/tmp"
vol := spec.volumes[_]
vol.name == mount.name
vol.emptyDir
}
2. Checkov — Scan for readOnly misconfiguration in IaC
# Install
pip install checkov
# Scan your Helm-rendered manifests or raw YAML
checkov -d ./k8s-manifests --framework kubernetes \
--check CKV_K8S_28 # readOnlyRootFilesystem check
# In CI (GitHub Actions)
- name: Checkov Kubernetes Scan
uses: bridgecrewio/checkov-action@master
with:
directory: k8s-manifests/
framework: kubernetes
soft_fail: false
3. Kyverno Policy — Block PVCs with ReadOnlyMany for stateful workloads
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: block-readonly-pvc-for-stateful
spec:
validationFailureAction: Enforce
rules:
- name: check-pvc-access-mode
match:
resources:
kinds: [StatefulSet]
validate:
message: "StatefulSet volumeClaimTemplates must not use ReadOnlyMany."
pattern:
spec:
volumeClaimTemplates:
- spec:
accessModes:
"!(ReadOnlyMany)"
4. Node Health Monitoring — Alert before remount-ro happens
# Prometheus alerting rule
groups:
- name: node-disk-health
rules:
- alert: NodeFilesystemReadOnly
expr: node_filesystem_readonly{fstype!~"tmpfs|overlay"} == 1
for: 1m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.instance }} filesystem {{ $labels.mountpoint }} is READ-ONLY"
description: "Kernel remounted filesystem ro due to I/O errors. Cordon immediately."
Install node_exporter on all nodes — node_filesystem_readonly is the metric that catches this before your pods start failing.