Can I fix 'role does not have replication permission' without restarting PostgreSQL?

Yes. ALTER ROLE replicator REPLICATION LOGIN; takes effect immediately for new connections. Existing sessions are unaffected. No pg_ctl restart or reload is required for the role attribute change itself, though pg_hba.conf changes still need a reload via pg_reload_conf().

What is the difference between granting SUPERUSER vs REPLICATION to a replication role?

SUPERUSER implicitly includes REPLICATION but also grants unrestricted DDL, DML, and the ability to bypass row-level security. A compromised SUPERUSER replication credential is a full database takeover. Use REPLICATION LOGIN only — it grants exactly the WAL sender permission needed and nothing else.

Why does pg_stat_replication show no rows even after fixing the role permission?

Check three things in order: (1) pg_hba.conf has a 'replication' database entry (not 'all') for the replica's IP, (2) primary_conninfo on the standby points to the correct host/port/user, and (3) the standby's recovery signal file (standby.signal in PG12+) exists. All three must be correct for a WAL sender row to appear.

How to Fix PostgreSQL FATAL: role 'replication' does not have replication permission

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 5 mins

TL;DR

What broke: The PostgreSQL role used for streaming replication was created without the REPLICATION attribute — PostgreSQL rejects the connection at the authentication layer before WAL streaming begins.
How to fix it: Run ALTER ROLE replication REPLICATION LOGIN; on the primary and confirm pg_hba.conf has a replication connection entry for that role.
Fast path: Use our Client-Side Sandbox below to auto-refactor your CREATE ROLE statement or pg_hba.conf — paste it in and get corrected SQL instantly.

The Incident (What does the error mean?)

FATAL:  role "replication" does not have replication permission

This fires on the standby, logged during pg_basebackup or recovery.conf / primary_conninfo connection attempts. PostgreSQL's WAL sender process on the primary validates the connecting role's rolreplication flag in pg_authid before it opens a replication slot or streams WAL. If that flag is false, the connection dies immediately.

Immediate consequence: The standby is receiving zero WAL. It is frozen at the LSN it last received. If the primary fails right now, your failover target is stale. Patroni, repmgr, or Pacemaker will either promote a lagging replica or refuse to promote entirely.

The Attack Vector / Blast Radius

This is a misconfigured privilege failure, not a network failure. The blast radius:

HA is silently broken. Monitoring that only checks pg_stat_replication row existence may not catch this — the row never appears because the connection never completes.
RPO is undefined. You have no idea how long the replica has been dark. In a failover, you promote a replica with an unknown data gap.
pg_replication_slots leak. If a slot was pre-created and the subscriber can never connect, pg_wal_lsn_diff on pg_replication_slots.confirmed_flush_lsn will grow unbounded, causing WAL accumulation on the primary disk — a separate outage vector.
Security note: Granting SUPERUSER as a workaround (common in rushed fixes) gives the replication user full DDL/DML access. A compromised replication credential then becomes a full database takeover. Never do this.

How to Fix It (The Solution)

Basic Fix

On the primary, check the current state:

SELECT rolname, rolreplication, rolcanlogin FROM pg_authid WHERE rolname = 'replication';

If rolreplication = false, patch it:

- CREATE ROLE replication LOGIN PASSWORD 'secret';
+ CREATE ROLE replication REPLICATION LOGIN PASSWORD 'secret';

Or if the role already exists:

- -- role exists without REPLICATION flag
+ ALTER ROLE replication REPLICATION LOGIN;

No PostgreSQL restart required. The next connection attempt from the standby will succeed immediately.

pg_hba.conf — Verify the replication entry

- host    all             replication     10.0.1.0/24     md5
+ host    replication     replication     10.0.1.0/24     scram-sha-256

Two things matter here: the database column must be replication (literal keyword, not a DB name), and use scram-sha-256 over md5 on any PostgreSQL ≥ 14 cluster.

After editing pg_hba.conf, reload (no restart):

psql -c "SELECT pg_reload_conf();"
# or
systemctl reload postgresql

Enterprise Best Practice — Least-Privilege Replication Role

Do not reuse application roles or grant superuser. Isolate replication identity:

- CREATE ROLE app_user SUPERUSER LOGIN PASSWORD 'apppass';  -- used for replication too
+ CREATE ROLE replicator REPLICATION LOGIN PASSWORD '<vault-generated>' CONNECTION LIMIT 3;
+ -- Grant only if using logical replication (PG14+):
+ GRANT pg_read_all_data TO replicator;  -- logical only, omit for physical

For logical replication, the role also needs SELECT on published tables or pg_read_all_data. For physical/streaming, REPLICATION LOGIN is sufficient — nothing else.

Store the password in Vault or AWS Secrets Manager. Rotate it via ALTER ROLE replicator PASSWORD '...' — active WAL sender connections survive a password rotation without disconnect.

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Terraform / Pulumi — enforce at provisioning time:

- resource "postgresql_role" "replicator" {
-   name     = "replicator"
-   login    = true
-   password = var.replication_password
- }
+ resource "postgresql_role" "replicator" {
+   name        = "replicator"
+   login       = true
+   replication = true
+   password    = var.replication_password
+   connection_limit = 5
+ }

2. Checkov policy — add a custom check asserting replication = true on any role whose name matches replicat*.

3. Post-deploy smoke test in your pipeline (runs against the primary after Terraform apply):

psql "$PRIMARY_DSN" -c \
  "SELECT rolreplication FROM pg_authid WHERE rolname='replicator'" \
  | grep -q 't' || { echo 'REPLICATION flag missing — failing pipeline'; exit 1; }

4. Prometheus alert — if pg_stat_replication has zero rows and pg_is_in_recovery() is false on the primary, fire a P1 alert. This catches this failure class even when the role issue is introduced outside Terraform (manual psql, Ansible drift).

- alert: PostgreSQLNoActiveReplica
  expr: pg_stat_replication_count == 0 and pg_replication_is_replica == 0
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "Primary has no connected replicas — HA is broken"