Initializing Enclave...

Fixing x509: Certificate Signed by Unknown Authority When Pulling from a Private Harbor Registry in Air-Gapped Kubernetes

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–45 mins depending on node count

TL;DR

  • What broke: Your Kubernetes nodes' container runtime (containerd or Docker) cannot verify the TLS certificate presented by Harbor because the internal CA that signed it is not in the runtime's or OS's trust store.
  • How to fix it: Distribute the internal CA certificate to every node's OS trust store AND to containerd's per-registry TLS config directory, then restart containerd. For kubeadm clusters bootstrapped via cloud-init or Ansible, inject the CA at node provisioning time.
  • Shortcut: Use our Client-Side Sandbox above to auto-refactor your config.toml or Dockerfile — paste your failing config and get corrected output without sending secrets to any external server.

The Incident (What Does the Error Mean?)

Raw error from kubectl describe pod or crictl pull:

Failed to pull image "harbor.internal.corp/prod/app:1.4.2": 
  rpc error: code = Unknown 
  desc = failed to pull and unpack image "harbor.internal.corp/prod/app:1.4.2": 
  failed to resolve reference "harbor.internal.corp/prod/app:1.4.2": 
  failed to do request: Head "https://harbor.internal.corp/v2/prod/app/manifests/1.4.2": 
  x509: certificate signed by unknown authority

Immediate consequence: Every pod on every node that needs to pull from this registry is stuck in ImagePullBackOff or ErrImagePull. In air-gapped clusters, there is no fallback to Docker Hub. Your deployment is completely blocked. Rollbacks also fail if the previous image tag isn't already cached on the node.


The Attack Vector / Blast Radius

This isn't just an annoyance — the wrong fix creates a critical security hole.

The dangerous non-fix: Setting insecure-registries or insecure = true in containerd disables TLS verification entirely. In an air-gapped environment, engineers assume the network is safe and leave this in place permanently. This means:

  • Any node on the internal network can perform a man-in-the-middle attack against image pulls — substituting a malicious image layer with no warning.
  • Supply chain compromise via image substitution becomes trivial for any internal threat actor or compromised host.
  • Your Harbor notary/cosign signatures are meaningless if TLS isn't verified — the client never even checks them.

Blast radius of the root cause: If your cluster uses a cluster-autoscaler or Karpenter, every new node that spins up will also fail to pull images — the CA cert is missing from the node AMI/image. This cascades into a full cluster-level availability failure under load.


How to Fix It (The Solution)

Basic Fix — Single Node (Verify the Chain First)

Step 1: Grab the CA cert from Harbor

# Pull the full cert chain from the Harbor endpoint
openssl s_client -connect harbor.internal.corp:443 -showcerts </dev/null 2>/dev/null \
  | openssl x509 -outform PEM > /tmp/harbor-ca.crt

# Verify it's the CA (Issuer == Subject means self-signed root)
openssl x509 -in /tmp/harbor-ca.crt -noout -text | grep -E 'Issuer|Subject'

Step 2: Add to OS trust store (RHEL/CentOS/Amazon Linux)

cp /tmp/harbor-ca.crt /etc/pki/ca-trust/source/anchors/harbor-internal-ca.crt
update-ca-trust extract

For Debian/Ubuntu nodes:

cp /tmp/harbor-ca.crt /usr/local/share/ca-certificates/harbor-internal-ca.crt
update-ca-certificates

Step 3: Configure containerd with the CA cert (THE CRITICAL STEP most guides miss)

The OS trust store is not enough. containerd uses its own TLS verification path.

mkdir -p /etc/containerd/certs.d/harbor.internal.corp
# /etc/containerd/certs.d/harbor.internal.corp/hosts.toml

- # File does not exist — containerd uses system default CA bundle only
+ [host."https://harbor.internal.corp"]
+   ca = ["/etc/containerd/certs.d/harbor.internal.corp/harbor-ca.crt"]
# Copy the CA cert into the containerd certs directory
cp /tmp/harbor-ca.crt /etc/containerd/certs.d/harbor.internal.corp/harbor-ca.crt

# Restart containerd — required, no hot-reload
systemctl restart containerd

# Verify
crictl pull harbor.internal.corp/prod/app:1.4.2

Enterprise Best Practice — Fleet-Wide via Ansible + Node Bootstrap

For a 50+ node air-gapped cluster, manual cert distribution is a single point of failure. The CA cert must be baked into your node provisioning pipeline.

Ansible task (add to your node hardening role):

# roles/node-bootstrap/tasks/registry-tls.yml

- # No certificate distribution — nodes fail on first image pull
+ - name: Distribute Harbor internal CA to OS trust store
+   copy:
+     src: files/harbor-internal-ca.crt
+     dest: /etc/pki/ca-trust/source/anchors/harbor-internal-ca.crt
+     owner: root
+     group: root
+     mode: '0644'
+   notify: update-ca-trust
+
+ - name: Create containerd certs.d directory for Harbor
+   file:
+     path: /etc/containerd/certs.d/harbor.internal.corp
+     state: directory
+     mode: '0755'
+
+ - name: Distribute Harbor CA to containerd certs directory
+   copy:
+     src: files/harbor-internal-ca.crt
+     dest: /etc/containerd/certs.d/harbor.internal.corp/harbor-ca.crt
+     owner: root
+     group: root
+     mode: '0644'
+   notify: restart-containerd
+
+ - name: Deploy containerd hosts.toml for Harbor registry
+   template:
+     src: templates/harbor-hosts.toml.j2
+     dest: /etc/containerd/certs.d/harbor.internal.corp/hosts.toml
+     owner: root
+     group: root
+     mode: '0644'
+   notify: restart-containerd

containerd config.toml — ensure config_path is set (containerd ≥ 1.5):

# /etc/containerd/config.toml

  [plugins."io.containerd.grpc.v1.cri".registry]
-   # config_path not set — hosts.toml files are ignored
+   config_path = "/etc/containerd/certs.d"

⚠️ If config_path is not set, your hosts.toml files are silently ignored. This is the #1 reason the fix appears to be applied but still fails.


💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Bake the CA into your node image (Packer)

If you're using Packer to build AMIs or QCOW2 images for your air-gapped nodes, the CA cert and hosts.toml must be provisioned during the image build — not at runtime.

# packer/node-image.pkr.hcl
provisioner "file" {
  source      = "files/harbor-internal-ca.crt"
  destination = "/etc/containerd/certs.d/harbor.internal.corp/harbor-ca.crt"
}

2. OPA/Gatekeeper policy — block insecure registry configs

Prevent engineers from "fixing" this the wrong way:

# opa/policies/no-insecure-registry.rego
package kubernetes.admission

deny[msg] {
  input.request.kind.kind == "Node"
  input.request.object.metadata.annotations["containerd.insecure-registries"] != ""
  msg := "Insecure registry configuration is prohibited. Distribute the CA certificate instead."
}

3. Checkov — scan containerd configs in IaC

# Add to your CI pipeline
checkov -d ./ansible/roles/node-bootstrap \
  --check CKV_DOCKER_2 \
  --compact

4. cert-manager + trust-manager for in-cluster CA distribution

For clusters where Harbor's cert is rotated regularly, use cert-manager's trust-manager to automatically sync the CA bundle to a ConfigMap and mount it into containerd via a DaemonSet that watches for changes — eliminating manual cert rotation across nodes.

helm upgrade --install trust-manager jetstack/trust-manager \
  --namespace cert-manager \
  --set app.trust.namespace=cert-manager

This is the only production-grade solution for CA cert lifecycle management at scale.

Related Diagnostics

"Part of the Security Utility Matrix."

View all 140 Security Tools →