What is idle_in_transaction_session_timeout in PostgreSQL?

It is a PostgreSQL server parameter (available since v9.6) that specifies the maximum time in milliseconds a session can remain in the 'idle in transaction' state — meaning a transaction has been opened with BEGIN but no query is actively executing. When the timeout expires, PostgreSQL terminates the backend process and rolls back the open transaction. Setting it to 0 disables the timeout entirely, which is dangerous in production as it allows sessions to hold locks and connections indefinitely.

Why does 'Connection reset by peer' appear after idle_in_transaction_session_timeout fires?

When PostgreSQL terminates a backend session due to the timeout, it closes the server-side TCP socket. The client OS receives a TCP RST packet — a hard connection reset rather than a graceful FIN/ACK close. The client library (psycopg2, JDBC, node-postgres, etc.) surfaces this as 'Connection reset by peer'. The connection is unrecoverable and must be discarded from the pool and replaced with a fresh connection.

What is the recommended value for idle_in_transaction_session_timeout?

There is no universal value, but a common production baseline is 30000ms (30 seconds) for OLTP workloads. The key constraint is that your application's longest legitimate transaction — excluding any external I/O that should not be inside a transaction — must complete well within this window. For connection poolers like PgBouncer or HikariCP, set their idle connection timeout to a value strictly less than the PostgreSQL server timeout so the pool recycles connections before the server kills them.

Fixing PostgreSQL 'Connection Reset by Peer' After idle_in_transaction_session_timeout

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

What broke: PostgreSQL's idle_in_transaction_session_timeout fired, killing a backend session that held an open transaction too long without executing a query. The TCP connection was torn down server-side; the client got Connection reset by peer.
How to fix it: Audit your connection pool config and application transaction boundaries. Ensure transactions are committed/rolled back promptly. Tune idle_in_transaction_session_timeout to a safe non-zero value and fix the code holding transactions open.
Use our Client-Side Sandbox below to paste your postgresql.conf, pgbouncer.ini, or application DB config and auto-refactor it for safe transaction handling.

The Incident (What Does the Error Mean?)

Raw error output from PostgreSQL logs:

LOG:  could not receive data from client: Connection reset by peer
LOG:  unexpected EOF on client connection with an open transaction
FATAL:  terminating connection due to idle-in-transaction timeout

PostgreSQL enforces idle_in_transaction_session_timeout (introduced in PG 9.6). When a session opens a transaction (BEGIN) and then goes silent — no queries, no commit, no rollback — for longer than the configured interval, the server forcibly terminates the backend process. The OS resets the TCP socket. The client library sees Connection reset by peer.

Immediate consequences:

The open transaction is rolled back entirely. Any uncommitted writes are lost.
The application connection is dead. If your pool doesn't detect this, the next query on that connection throws a broken pipe or similar error.
Under high concurrency, this cascades: multiple connections die simultaneously, the pool exhausts, and the application starts refusing requests.

The Attack Vector / Blast Radius

This is a cascading availability failure, not a security exploit — but the blast radius is severe in production:

Connection pool poisoning. PgBouncer, HikariCP, and SQLAlchemy pools cache connections. A server-side reset is not always detected immediately. The pool hands out a dead connection to the next thread. That thread throws an exception. If retry logic is absent or naive, the request fails silently.
Lock amplification. The killed session was likely holding row-level or table-level locks. While PG releases locks on transaction termination, any dependent queries were blocked for the full timeout duration before that. Under load, this queues up hundreds of waiting backends.
Data integrity risk. If the application assumes a transaction committed (e.g., it already returned HTTP 200 to the user), but the timeout killed it mid-flight, you have a phantom write — the user believes the operation succeeded, the DB has no record of it.
Root cause is almost always application-layer: a background job that opens a transaction, calls an external HTTP API mid-transaction, and waits. The HTTP call stalls. The transaction sits idle. PG kills it.

How to Fix It

Basic Fix — Set a Sane Timeout and Fix the Immediate Config

If idle_in_transaction_session_timeout is set to 0 (disabled), you have no protection and sessions leak indefinitely. If it's set too aggressively low (e.g., 1s), legitimate slow operations get killed.

postgresql.conf:

- idle_in_transaction_session_timeout = 0
+ idle_in_transaction_session_timeout = 30000   # 30 seconds — adjust per workload
+ idle_session_timeout = 600000                 # 10 min — kill truly idle sessions too

Reload without restart: SELECT pg_reload_conf();

Enterprise Best Practice — Fix the Application Transaction Boundaries

The real fix is in the application. Never call external services, sleep, or do I/O inside an open transaction.

Python (SQLAlchemy) — Bad Pattern:

- with session.begin():
-     record = session.query(Order).get(order_id)
-     response = requests.post("https://payment-api.internal/charge", json={...})  # BLOCKS HERE
-     record.status = "paid"

Fixed Pattern:

+ # Do all external I/O BEFORE opening the transaction
+ response = requests.post("https://payment-api.internal/charge", json={...}, timeout=10)
+ response.raise_for_status()
+
+ with session.begin():
+     record = session.query(Order).get(order_id)
+     record.status = "paid"
+     # Transaction is open for milliseconds, not seconds

PgBouncer — Ensure statement-level or transaction-level pooling is configured to handle server-side resets:

- pool_mode = session
+ pool_mode = transaction

- server_idle_timeout = 600
+ server_idle_timeout = 60

+ server_reset_query = DISCARD ALL
+ tcp_keepalive = 1
+ tcp_keepidle = 10
+ tcp_keepintvl = 10

HikariCP (Java) — Add connection validation:

- hikari.connectionTimeout=30000
- hikari.idleTimeout=600000

+ hikari.connectionTimeout=30000
+ hikari.idleTimeout=60000          # Must be < idle_in_transaction_session_timeout
+ hikari.keepaliveTime=30000
+ hikari.connectionTestQuery=SELECT 1
+ hikari.validationTimeout=5000

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Lint postgresql.conf in Terraform/Ansible:

- # No timeout set — silent misconfiguration
- resource "aws_db_parameter_group" "pg" {
-   parameter { name = "idle_in_transaction_session_timeout" value = "0" }
- }

+ resource "aws_db_parameter_group" "pg" {
+   parameter { name = "idle_in_transaction_session_timeout" value = "30000" apply_method = "immediate" }
+   parameter { name = "idle_session_timeout"                value = "600000" apply_method = "immediate" }
+ }

2. Checkov policy — flag disabled timeouts:

Add a custom Checkov check or use conftest with OPA:

deny[msg] {
  input.resource.aws_db_parameter_group[_].parameter[p]
  p.name == "idle_in_transaction_session_timeout"
  p.value == "0"
  msg := "idle_in_transaction_session_timeout must not be 0 (disabled). Set to 30000ms or lower."
}

3. Integration test gate: Add a test that opens a transaction, sleeps past your configured timeout, and asserts the pool correctly recovers (reconnects) rather than serving a dead connection. Run this in your staging pipeline on every deploy.

4. Observability: Alert on pg_stat_activity where state = 'idle in transaction' and now() - state_change > interval '10 seconds'. This fires before PG kills the session and gives you time to identify the offending query pattern.

SELECT pid, usename, application_name, state, now() - state_change AS idle_duration, query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
  AND now() - state_change > interval '10 seconds'
ORDER BY idle_duration DESC;

Set this as a Datadog, Prometheus, or CloudWatch custom metric. Page on it.