What is the difference between Calico BGP state 'Active' vs 'Connect' vs 'Established'?

'Connect' means BIRD is actively attempting the TCP handshake to port 179. 'Active' means the TCP connection failed and BIRD is waiting before retrying — this is where 'Connection refused' lands. 'Established' is the only healthy state, meaning the BGP session is up and routes are being exchanged. If you're stuck in Active/Connect for more than 2–3 retry cycles, the issue is network-layer (firewall, wrong IP) not BGP protocol-level.

Why does 'Connection refused' specifically point to a firewall REJECT rule rather than a DROP rule?

TCP 'Connection refused' (RST packet returned) means the remote kernel received the SYN and actively rejected it — either because nothing is listening on port 179 (BIRD not running) or an iptables/nftables REJECT rule is in place. A DROP rule or a missing route produces a timeout with no response. This distinction is critical: 'refused' narrows you to either 'BIRD is down on the peer' or 'explicit REJECT firewall rule' — both are immediately actionable.

Can Calico BGP peers fail to establish even when port 179 is open?

Yes. Open port 179 rules out the network layer but BGP sessions can still fail at the protocol level. Common causes: mismatched AS numbers (BGP OPEN message rejected), mismatched BGP passwords (MD5 auth failure — session silently drops), Calico version mismatch causing capability negotiation failure, or a nodeSelector on the BGPPeer resource that excludes the intended node. Check BIRD logs directly: kubectl logs -n kube-system -c calico-node | grep -i 'bgp\|bird\|peer'.

Fixing Calico BGP Peer 'Connection Refused': Why Your BGP Session Won't Establish and How to Resolve It

Threat/Impact Level: CRITICAL | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

What broke: Calico's BIRD BGP daemon cannot reach the peer IP on TCP/179 — the remote end is actively refusing the connection, meaning BGP sessions are down and inter-node pod routing is black-holing traffic.
How to fix it: Verify the peer IP, AS numbers, node-to-node firewall rules on port 179, and whether bgp is enabled on the target node's Calico config. Confirm BIRD is running on the peer.
Use our Client-Side Sandbox below to auto-refactor this — paste your BGPPeer YAML or calicoctl node status output and get a corrected config without sending your topology to a third-party AI.

The Incident (What does the error mean?)

Raw output from calicoctl node status:

Calico process is running.

IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+---------------+-------------------+-------+----------+-------------+
| 10.0.1.45     | node-to-node mesh | start | 04:12:33 | Connect     |
| 192.168.10.1  | global            | start | 04:12:33 | Active      |
+---------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

Or from BIRD logs (kubectl logs -n kube-system <calico-node-pod> -c calico-node):

2024-01-15 04:13:01.456 [ERROR] Connection to BGP neighbor 10.0.1.45 failed: Connection refused
2024-01-15 04:13:01.457 [INFO]  BGP session with 10.0.1.45 moved to ACTIVE state

Immediate consequence: BGP is in Connect or Active state — never Established. Calico is not exchanging routes. Every pod on the affected node cannot reach pods on the peer node. kubectl exec cross-node calls hang. Services backed by pods on the unreachable node return connection timeouts. This is a full network partition for affected node pairs.

The Attack Vector / Blast Radius

This is not a flap — Connection refused means TCP/179 is being actively rejected by the remote host's kernel or firewall. This is distinct from a timeout (firewall drop) or a BGP NOTIFICATION (protocol-level rejection).

Cascading failure chain:

BIRD on the local node attempts TCP handshake to peer:179.
Remote kernel sends RST — port closed or firewall REJECT rule hit.
BIRD backs off, retries on exponential timer (up to 120s intervals).
No BGP routes exchanged → Calico Felix has no remote workload routes → programs no routes into the kernel routing table.
All cross-node pod traffic is unroutable. In vxlan mode this is masked slightly longer; in native BGP mode it's instant.
If this is a ToR/upstream router peer (global BGP peer for on-prem), your entire cluster loses external reachability — not just cross-node pod traffic.

In multi-tenant clusters, this silently breaks network policies that depend on cross-node enforcement, creating a false sense of isolation while traffic is simply dropped rather than policy-denied.

How to Fix It

Step 0: Confirm the exact failure mode

# From the affected calico-node pod
kubectl exec -n kube-system <calico-node-pod> -- nc -zv <PEER_IP> 179
# "Connection refused" = port closed or REJECT rule
# Timeout = DROP rule or wrong IP
# Success = BIRD config issue, not network

# Check BIRD daemon is alive on the PEER node
kubectl exec -n kube-system <calico-node-pod-on-peer> -- pgrep -a bird

Basic Fix — Firewall / Security Group on TCP/179

The most common cause in cloud environments (AWS, GCP, Azure) is missing inbound rules for BGP.

# AWS Security Group (Terraform)
 resource "aws_security_group_rule" "node_bgp" {
   type              = "ingress"
-  # Missing BGP rule — nodes can't peer
+  from_port         = 179
+  to_port           = 179
+  protocol          = "tcp"
+  self              = true
   security_group_id = aws_security_group.nodes.id
 }

For iptables-based hosts:

- # No rule permitting TCP/179
+ iptables -I INPUT -p tcp --dport 179 -s <PEER_CIDR> -j ACCEPT
+ iptables -I OUTPUT -p tcp --sport 179 -d <PEER_CIDR> -j ACCEPT

Basic Fix — Mismatched AS Numbers

# BGPPeer YAML
 apiVersion: projectcalico.org/v3
 kind: BGPPeer
 metadata:
   name: peer-to-tor-switch
 spec:
   peerIP: 192.168.10.1
-  asNumber: 64512
+  asNumber: 65001  # Must match the AS configured on the ToR switch
   node: worker-node-01

Basic Fix — BGP disabled on node or wrong node selector

# BGPConfiguration
 apiVersion: projectcalico.org/v3
 kind: BGPConfiguration
 metadata:
   name: default
 spec:
-  nodeToNodeMeshEnabled: false
+  nodeToNodeMeshEnabled: true
   asNumber: 64512

Or if using node-specific peer with wrong selector:

 apiVersion: projectcalico.org/v3
 kind: BGPPeer
 metadata:
   name: rack1-peer
 spec:
   peerIP: 10.0.1.45
   asNumber: 65000
-  nodeSelector: "rack == 'rack2'"  # Wrong rack label, peer node excluded
+  nodeSelector: "rack == 'rack1'"

Enterprise Best Practice — Structured BGP peer config with password auth and explicit selectors

 apiVersion: projectcalico.org/v3
 kind: BGPPeer
 metadata:
   name: upstream-tor-rack1
 spec:
   peerIP: 10.100.0.1
   asNumber: 65000
+  password:
+    secretKeyRef:
+      name: bgp-peer-secret
+      key: password          # MD5 TCP session auth — prevents session hijack
+  keepOriginalNextHop: false
+  maxRestartTime: 60s
   nodeSelector: >-
-    all()
+    kubernetes.io/hostname in {'worker-01', 'worker-02'}  # Scope to rack nodes only

Verify after applying:

calicoctl node status
# Target: STATE = Established, INFO = Established  X routes imported, Y exported

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Validate BGPPeer manifests with OPA/Conftest before apply

# policy/bgp_peer.rego
package calico.bgppeer

deny[msg] {
  input.kind == "BGPPeer"
  not input.spec.asNumber
  msg := "BGPPeer must specify asNumber explicitly"
}

deny[msg] {
  input.kind == "BGPPeer"
  input.spec.nodeSelector == "all()"
  msg := "BGPPeer nodeSelector 'all()' is too broad for production — scope to specific nodes"
}

conftest test bgppeer.yaml --policy policy/

2. Smoke-test BGP state in post-deploy pipeline

#!/bin/bash
# ci/verify-bgp.sh — run after Calico config changes
UNESTABLISHED=$(calicoctl node status 2>/dev/null | grep -v Established | grep -c 'start\|Active\|Connect')
if [ "$UNESTABLISHED" -gt 0 ]; then
  echo "FATAL: $UNESTABLISHED BGP peer(s) not established after deploy"
  exit 1
fi
echo "All BGP peers established."

3. Terraform — enforce security group BGP rule presence

# checkov ignore is NOT acceptable here — enforce with policy
# Add to your Checkov custom checks:
# CKV_CUSTOM_BGP_01: "Ensure node security group allows TCP/179 intra-group"

Run Checkov in CI:

checkov -d ./terraform --check CKV_CUSTOM_BGP_01 --compact

4. Alert on BGP session flaps via Prometheus

Calico exports felix_bgp_num_peers and felix_bgp_num_established_peers. Alert when they diverge:

# prometheus-rule.yaml
groups:
- name: calico-bgp
  rules:
  - alert: CalicoBGPPeerNotEstablished
    expr: felix_bgp_num_peers - felix_bgp_num_established_peers > 0
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "BGP peer down on {{ $labels.instance }} — pod routing degraded"