Why does kubectl logs work without -f but hang or 403 with -f?

Without -f, kubectl performs a standard HTTP GET to retrieve a static log snapshot — this goes through normal API server authorization. With -f, kubectl requests an HTTP protocol upgrade to a streaming WebSocket connection that is proxied through the API server to the kubelet's /containerLogs/ endpoint. This upgrade path requires the additional RBAC verb on the 'nodes/proxy' subresource. If that verb is missing from your ClusterRole, the static GET succeeds but the streaming upgrade is denied with 403.

Does IRSA affect kubectl log access, or is this purely a node IAM issue?

Both layers matter but for different reasons. IRSA governs what AWS API calls your pod's service account can make (e.g., S3, DynamoDB). The 403 on kubectl logs is a Kubernetes RBAC issue affecting the human operator's IAM role mapped in aws-auth, not the pod's service account. However, IRSA misconfiguration (wrong namespace in OIDC sub claim, missing aud condition) causes the pod to fail STS token exchange, which can cause it to fall back to node instance profile credentials — a separate but related blast radius.

How do I quickly verify which layer is causing the 403 without a long debugging loop?

Run three commands in sequence. First: 'kubectl auth whoami' — if this returns 'system:anonymous', your IAM role is not in aws-auth. Second: 'kubectl auth can-i get pods/log -n your-namespace' — if this returns 'no', it's RBAC. Third: 'aws sts get-caller-identity' — confirm you are assuming the correct IAM role that is mapped in aws-auth. If all three pass but logs still 403, check the node instance profile for AmazonEKSWorkerNodePolicy and verify IMDSv2 is not blocking the kubelet's credential refresh.

Fixing 'unable to upgrade connection: 403 Forbidden' in EKS kubectl logs with IRSA

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

What broke: kubectl logs -f initiates a WebSocket upgrade request through the EKS API server to the kubelet. A 403 Forbidden means the IAM principal or the Kubernetes RBAC subject is being denied at one of three checkpoints: the aws-auth ConfigMap mapping, the RBAC pods/log verb, or the node IAM instance profile lacking AmazonEKSWorkerNodePolicy.
How to fix it: Verify the aws-auth ConfigMap maps your IAM role correctly, confirm your RBAC ClusterRole includes pods/log and pods/exec verbs, and ensure the EC2 node instance profile has the required managed policies attached.
Shortcut: Use our Client-Side Sandbox below to auto-refactor your aws-auth ConfigMap and RBAC manifests without leaking your ARNs.

The Incident (What Does the Error Mean?)

$ kubectl logs -f my-pod-7d9f8b-xkz2p
error: unable to upgrade connection: Forbidden (user=arn:aws:iam::123456789012:assumed-role/my-role/session, verb=create, resource=nodes, subresource=proxy)

kubectl logs -f is not a simple REST GET. It requests an HTTP/1.1 → WebSocket protocol upgrade routed through the EKS API server down to the kubelet's /containerLogs/ endpoint on the node. The EKS API server enforces both IAM authentication (via the aws-iam-authenticator webhook) and Kubernetes RBAC authorization on this upgrade path. A 403 means the request passed TLS but was denied at the authorization layer. Your pod is running. Your cluster is up. But you are blind — no logs, no exec, no port-forward.

The Attack Vector / Blast Radius

This is not just an inconvenience. The blast radius breaks down into two failure modes:

1. Operational Blindness During Incidents If this hits during a production outage, your on-call engineer cannot stream logs. They cannot kubectl exec into a crashing container. Every second spent debugging the 403 instead of the actual application failure compounds the MTTR.

2. IRSA Privilege Confusion Leading to Overpermissioning The dangerous failure pattern: an engineer hits this 403, panics, and attaches AmazonEKSClusterPolicy or a wildcard eks:* IAM policy to the node role to "just make it work." This is how nodes end up with permissions to call eks:CreateCluster, iam:PassRole, or ec2:RunInstances. A compromised pod on that node can then use the IMDS v1 endpoint (http://169.254.169.254/latest/meta-data/iam/security-credentials/) to retrieve the node's instance profile credentials and escalate to full cluster or account takeover.

Root cause is almost always one of these four things:

#	Cause	Where It Breaks
1	IAM role not mapped in `aws-auth` ConfigMap	Authentication webhook
2	RBAC missing `pods/log` subresource verb	Kubernetes RBAC
3	Node instance profile missing `AmazonEKSWorkerNodePolicy`	Kubelet ↔ API server trust
4	IRSA OIDC trust policy condition mismatch (`sts:AssumeRoleWithWebIdentity`)	IAM STS

How to Fix It

Step 1 — Confirm Who the API Server Thinks You Are

kubectl auth whoami
# or for older clusters:
kubectl get pods --v=9 2>&1 | grep "Response Status"

If the returned username is system:anonymous, your IAM role is not in aws-auth. If it returns your role ARN but still 403s, it's RBAC.

Fix A — `aws-auth` ConfigMap (Basic Fix)

# kubectl edit configmap aws-auth -n kube-system
 apiVersion: v1
 kind: ConfigMap
 metadata:
   name: aws-auth
   namespace: kube-system
 data:
   mapRoles: |
-    - rolearn: arn:aws:iam::123456789012:role/my-dev-role
-      username: my-dev-user
-      groups:
-        - system:masters
+    - rolearn: arn:aws:iam::123456789012:role/my-dev-role
+      username: my-dev-user
+      groups:
+        - eks-log-viewers

Do not map developer roles to system:masters. That grants cluster-admin. Map to a scoped RBAC group.

Fix B — RBAC ClusterRole with `pods/log` (Enterprise Best Practice)

-apiVersion: rbac.authorization.k8s.io/v1
-kind: ClusterRole
-metadata:
-  name: eks-log-viewers
-rules:
-  - apiGroups: [""]
-    resources: ["pods"]
-    verbs: ["get", "list", "watch"]
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: eks-log-viewers
+rules:
+  - apiGroups: [""]
+    resources: ["pods"]
+    verbs: ["get", "list", "watch"]
+  - apiGroups: [""]
+    resources: ["pods/log"]
+    verbs: ["get", "list"]
+  - apiGroups: [""]
+    resources: ["nodes/proxy"]
+    verbs: ["get"]
---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: eks-log-viewers-binding
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: eks-log-viewers
+subjects:
+  - kind: Group
+    name: eks-log-viewers
+    apiGroup: rbac.authorization.k8s.io

Fix C — Node IAM Instance Profile (Managed Policies)

 # Terraform: aws_iam_role_policy_attachment for node group role
-resource "aws_iam_role_policy_attachment" "node_cni_only" {
-  role       = aws_iam_role.eks_node.name
-  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
-}
+resource "aws_iam_role_policy_attachment" "node_worker" {
+  role       = aws_iam_role.eks_node.name
+  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
+}
+
+resource "aws_iam_role_policy_attachment" "node_cni" {
+  role       = aws_iam_role.eks_node.name
+  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
+}
+
+resource "aws_iam_role_policy_attachment" "node_ecr_readonly" {
+  role       = aws_iam_role.eks_node.name
+  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
+}

Fix D — IRSA OIDC Trust Policy Condition Mismatch

 {
   "Effect": "Allow",
   "Principal": {
     "Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"
   },
   "Action": "sts:AssumeRoleWithWebIdentity",
   "Condition": {
     "StringEquals": {
-      "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:default:my-sa"
+      "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:my-namespace:my-sa",
+      "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:aud": "sts.amazonaws.com"
     }
   }
 }

Namespace mismatch in the OIDC sub claim is the #1 silent IRSA failure. The token is issued but STS rejects AssumeRoleWithWebIdentity, causing the pod to fall back to the node instance profile — or fail entirely.

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing aws-auth ConfigMap or IRSA trust policy into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Checkov — Scan Terraform Node Role Policies

checkov -d ./terraform --check CKV_AWS_58,CKV_AWS_37
# CKV_AWS_58: EKS node group must have AmazonEKSWorkerNodePolicy

2. OPA/Gatekeeper — Block system:masters Mappings in aws-auth

package k8s.awsauth

violation[{"msg": msg}] {
  binding := input.review.object.data.mapRoles
  contains(binding, "system:masters")
  msg := "Mapping IAM roles to system:masters is prohibited. Use scoped RBAC groups."
}

3. kubectl auth can-i in Pipeline Smoke Tests

# Add to post-deploy validation step
kubectl auth can-i get pods/log \
  --namespace=production \
  --as=system:serviceaccount:production:my-sa
# Must return: yes

4. Enforce IMDSv2 on All EKS Nodes (Blocks SSRF-based credential theft)

 resource "aws_launch_template" "eks_node" {
+  metadata_options {
+    http_endpoint               = "enabled"
+    http_tokens                 = "required"  # IMDSv2 only
+    http_put_response_hop_limit = 1
+  }
 }

5. Audit aws-auth Changes with CloudTrail + EventBridge

# Alert on any ConfigMap update in kube-system
aws events put-rule --name detect-aws-auth-mutation \
  --event-pattern '{"source":["aws.eks"],"detail-type":["AWS API Call via CloudTrail"],"detail":{"eventName":["UpdateConfigMap"],"requestParameters":{"namespace":["kube-system"],"name":["aws-auth"]}}}'