How Database Connection Pooling Actually Works Under the Hood

Each agent step opens a fresh DB connection if you let it. Misunderstood pooling is how AI agent products silently exhaust Postgres at the wrong moment.

Abhishek Sharma· Founder, Fordel Studios

April 10, 2026Updated May 8, 202615 min read

How Database Connection Pooling Actually Works Under the Hood

If you have shipped anything to production, you have used a connection pool. You might not have configured one yourself — your ORM probably did it for you. That is exactly the problem. The defaults are almost never right, and the failure mode is a 3 AM page that reads "too many connections."

···

What problem does connection pooling actually solve?

PostgreSQL uses a process-per-connection model inherited from its Berkeley origins in the 1980s. Every time a client connects, the postmaster forks a new backend process. That process gets its own memory allocation — typically 5-10 MB of work_mem, shared buffer mappings, and catalog caches. A TLS handshake adds another 2-5 milliseconds on top of the TCP three-way handshake.

For a monolith with a handful of long-lived application servers, this is fine. You open 20 connections at boot, reuse them for months, and never think about it. The economics change dramatically in three scenarios:

Microservices. If you have 30 services, each maintaining a pool of 10 connections, that is 300 backend processes before you serve a single query. At 10 MB each, you are consuming 3 GB of RAM just for connection overhead — RAM that could be serving actual queries from shared_buffers.

Serverless functions. Each invocation may spin up its own connection. A traffic spike of 500 concurrent Lambda or Vercel Functions means 500 simultaneous connection attempts. Most managed PostgreSQL instances cap at 100-500 max_connections. You are dead before your first SELECT completes.

Connection churn. Even if your peak concurrency is moderate, the cumulative cost of repeatedly creating and destroying connections adds up. Each fork, TLS negotiation, and authentication round trip takes 10-30 ms. At 100 requests per second, you are burning 1-3 full seconds of wall-clock time per second on connection setup alone.

10-30msCost per new PostgreSQL connectionTCP handshake + TLS negotiation + authentication + process fork overhead

Connection pooling eliminates this waste by maintaining a set of pre-established connections that get handed out to clients on demand and returned when done. The concept is simple. The implementation is where it gets interesting.

···

How does a connection pooler actually work internally?

A connection pooler is fundamentally a state machine that manages two independent sets of connections: client-facing connections (frontend) and database-facing connections (backend). The pooler’s job is to multiplex many frontend connections onto fewer backend connections.

The lifecycle looks like this:

1. A client connects to the pooler, not to PostgreSQL directly. The pooler authenticates the client (either by forwarding credentials or using its own auth database) and assigns the client a slot.

2. When the client issues a query, the pooler grabs an idle backend connection from the pool. If none are available, the client waits in a queue. If the queue is full, the connection is rejected.

3. After the query (or transaction, depending on mode) completes, the backend connection is returned to the idle pool. The client can issue more queries, which may land on a different backend connection.

4. The pooler runs health checks against backend connections — executing a lightweight query (typically SELECT 1 or the empty query protocol) to detect stale or broken connections before handing them to clients.

The critical design decision is step 3: when exactly does the backend connection get released back to the pool? This is determined by the pooling mode.

···

What are the pooling modes and when does each one break?

Mode	Connection released after	Multiplexing ratio	Feature restrictions	Best for
Session	Client disconnects	1:1 (no multiplexing)	None — full PostgreSQL feature set	Long-lived app servers, connection-hungry ORMs
Transaction	Transaction completes (COMMIT/ROLLBACK)	Many:1 (high multiplexing)	No PREPARE, no LISTEN/NOTIFY, no session variables (SET)	Serverless, microservices, high-concurrency workloads
Statement	Single statement completes	Highest multiplexing	No multi-statement transactions, no PREPARE, extremely limited	Simple read-only workloads, pgbench-style benchmarks

Transaction mode is the sweet spot for most production workloads, but it introduces a constraint that trips up nearly every team the first time: you cannot use PostgreSQL features that depend on session-level state.

PREPARE and DEALLOCATE create named prepared statements that live in the backend’s session memory. In transaction mode, your next query might hit a different backend that has no knowledge of that prepared statement. PgBouncer will silently succeed on the PREPARE — then the next query on a different backend will fail with "prepared statement does not exist." This is the single most common PgBouncer debugging scenario, and ORMs like Prisma, Django, and SQLAlchemy all use prepared statements by default.

SET commands (SET search_path, SET statement_timeout) also break in transaction mode because the setting persists on the backend connection and leaks into the next client’s session. PgBouncer has a server_reset_query setting (defaults to DISCARD ALL) that runs between client reassignments to clean up session state, but this adds latency and does not cover every edge case.

LISTEN/NOTIFY is fundamentally incompatible with transaction pooling because notifications are delivered to a specific backend process. If the listener’s backend connection gets reassigned between notifications, messages are silently lost. For pub/sub patterns, you need a dedicated session-mode pool or a separate connection that bypasses the pooler entirely.

···

How do you size a connection pool correctly?

Pool sizing is where most teams rely on intuition — and intuition is almost always wrong. The standard approach is to pick a number that "feels right" (usually 20 or whatever the ORM defaults to), then bump it up when things break. This leads to pools that are either too small (queries queue) or too large (you hit max_connections and waste memory).

The principled approach uses Little’s Law:

L = λ × W, where L is the average number of connections in use, λ is the average request arrival rate, and W is the average time each request holds a connection.

If your service handles 200 requests per second and each request holds a database connection for 5 ms (a typical simple query), you need: L = 200 × 0.005 = 1 connection on average. Even with variance, a pool of 5-10 handles this comfortably.

If your service handles 200 requests per second but each request runs a 50 ms query, you need: L = 200 × 0.05 = 10 connections on average. A pool of 15-20 handles the variance.

L = λ × WLittle’s Law for pool sizingAverage connections needed = request rate × average hold time. Add 50% headroom for variance.

The maximum pool size on the database side is bounded by a hard constraint: PostgreSQL’s max_connections setting (typically 100 on managed services, configurable up to ~5000 with enough RAM). But more connections is not better. After a point, adding connections increases contention on PostgreSQL’s internal lock manager, shared buffer LWLocks, and WAL insertion locks. Benchmarks consistently show that PostgreSQL throughput peaks at roughly 2-4x the number of CPU cores, then degrades.

“A common mistake is to set max_connections to 1000 and pool_size to 100 per service. You end up with 500 backend processes doing context switching instead of useful work, and throughput drops 40% compared to a pool of 50.”

Battle-tested lesson from a fintech migration

The formula for the database side: max_pool_per_service = (max_connections - superuser_reserved - monitoring_connections) / number_of_services. If PostgreSQL allows 200 connections, you reserve 5 for superuser, 5 for monitoring, and divide 190 across your services.

···

Where should the pooler live in your architecture?

The placement of the connection pooler has profound implications for failure modes, latency, and operational complexity. There are three patterns, and each makes a different tradeoff.

Sidecar pooler (PgBouncer on each app server)

Every application instance runs its own PgBouncer process. The app connects to localhost:6432 instead of the remote database. The sidecar maintains a small pool of backend connections (say, 5-10) to the actual database.

Advantages: No single point of failure. No additional network hop. Each service can configure its own pooling mode independently. Failure is isolated — if one sidecar crashes, only that instance is affected.

Disadvantages: Total backend connections = num_instances × pool_per_instance. With 50 instances and 10 connections each, you are using 500 backend connections. You have pushed the connection multiplication problem one layer out instead of solving it. This pattern works for monoliths and small service counts; it breaks down at microservice scale.

Centralized proxy (standalone PgBouncer or pgcat cluster)

A dedicated pooler sits between all application instances and the database. All services connect to the proxy, which maintains a single shared pool of backend connections.

Advantages: True connection multiplexing. Fifty services with varying loads share a pool of 50-100 backend connections. The database sees a stable, manageable number of connections regardless of how many app instances scale up.

Disadvantages: Single point of failure (unless clustered). Additional network hop adds 0.1-0.5 ms per query. Configuration is centralized — harder to give different services different pooling modes. PgBouncer is single-threaded, so one instance tops out at roughly 10K-15K transactions per second on modern hardware.

Serverless-native pooling (Neon, Supabase Supavisor, PlanetScale)

The database provider runs the pooler as part of their managed service. You connect to a pooler endpoint (usually on port 6543 instead of 5432) and the provider handles all connection management.

This is the model that Neon and Supabase have adopted, and it solves the serverless connection problem at the infrastructure layer. Neon uses a Rust-based proxy that handles connection multiplexing, TLS termination, and authentication in a single hop. Supabase built Supavisor in Elixir, leveraging BEAM’s lightweight process model to handle millions of client connections with minimal memory overhead.

Placement	Backend connections	Latency added	Failure blast radius	Operational cost
Sidecar (per-instance)	num_instances × pool_size	~0 ms (localhost)	Single instance	Low — runs alongside app
Centralized proxy	Shared pool (50-200)	0.1-0.5 ms	All services	Medium — needs HA setup
Serverless-native	Managed by provider	0.2-1 ms (varies)	Provider outage	Zero — fully managed

···

What are the failure modes nobody warns you about?

Connection pooling failures are insidious because they often present as database problems when the database is perfectly healthy. Here are the failure modes I have seen repeatedly in production:

Pool exhaustion under slow queries

Your pool has 20 connections. A new deployment introduces a query that takes 2 seconds instead of 20 ms. That query now holds a connection 100x longer. At the same request rate, you need 100x more connections. The pool empties in seconds, requests start queuing, and your p99 latency goes from 50 ms to 30 seconds as every request waits behind the slow ones. The database is not the bottleneck — your pool is. The fix is query timeouts on the pooler (query_timeout in PgBouncer) so that runaway queries get killed before they starve the pool.

Connection leak from uncommitted transactions

In transaction pooling mode, a backend connection is held for the duration of a transaction. If your application starts a BEGIN but never issues COMMIT or ROLLBACK — because of an unhandled exception, a timeout, or a logic error — that backend connection is permanently locked. It cannot be returned to the pool. This is a connection leak. You will see pool utilization creep up over hours until it hits 100% and all new requests hang. PgBouncer’s server_idle_timeout helps by killing backend connections that have been idle too long, but it does not help if the transaction is not idle — it is just waiting for a COMMIT that never comes.

Authentication storms after pooler restart

When a PgBouncer instance restarts, every client reconnects simultaneously. If you have 5000 clients, that is 5000 authentication attempts hitting PostgreSQL in a few seconds. PostgreSQL’s authentication is CPU-intensive (especially with scram-sha-256), and the postmaster serializes fork operations. The database becomes unresponsive under the authentication storm, which causes more clients to timeout and retry, creating a feedback loop. The mitigation is PgBouncer’s auth_type = hba with its own userlist.txt, so authentication happens at the pooler layer without touching PostgreSQL.

DNS resolution caching after failover

Your pooler connects to db.example.com, which resolves to 10.0.1.5. The database fails over to a standby at 10.0.1.6, and DNS updates. But PgBouncer caches the DNS resolution from its initial startup. It keeps trying to connect to the old primary, which is either dead or (worse) is the old primary still accepting writes during a split-brain scenario. PgBouncer has dns_max_ttl to control this, but the default is 15 seconds — which means up to 15 seconds of errors after every failover. Some teams set it to 0, which resolves DNS on every new backend connection, adding latency but guaranteeing failover correctness.

···

How do modern serverless databases handle this differently?

The serverless database providers have converged on a pattern: put an HTTP-based query interface in front of the pooler. Instead of maintaining persistent TCP connections, clients send individual queries over HTTP/2, and the provider’s edge infrastructure handles pooling internally.

Neon’s approach uses their neon-serverless driver, which sends queries over WebSockets from edge environments and over HTTP for one-shot queries. The connection pooler runs in their proxy layer — a Rust service called neon-proxy that handles TLS termination, authentication, and connection routing. Because the proxy owns the connection lifecycle, it can scale backend connections independently of client connections. A single proxy instance can serve thousands of WebSocket clients with a pool of 50 backend connections.

Supabase built Supavisor to replace PgBouncer specifically because PgBouncer is single-threaded and cannot scale horizontally. Supavisor runs on the BEAM VM (Erlang/Elixir), which gives it lightweight processes that can handle hundreds of thousands of concurrent client connections with minimal memory. It also supports named prepared statements in transaction mode — something PgBouncer cannot do — by tracking prepared statements per client and transparently re-preparing them on the backend when the connection changes.

PlanetScale (MySQL) took a different route entirely by building Vitess, which is less a connection pooler and more a full query routing layer. Vitess parses every query, routes it to the correct shard, and manages connection pools per shard internally. The connection pooling is invisible to the application — you connect to Vitess and it handles everything.

100xClient-to-backend multiplexing ratioModern serverless poolers like Supavisor can serve 10,000+ client connections with ~100 backend connections to PostgreSQL

···

Is PgBouncer still the right choice in 2026?

PgBouncer has been the default answer for 15 years, and it is still solid for its design constraints. It is a 30,000-line C program that does one thing — connection multiplexing — with extremely low overhead. A single PgBouncer instance adds less than 50 microseconds of latency per query. It is battle-tested, widely documented, and every managed PostgreSQL provider supports it.

But it has real limitations. It is single-threaded, so it cannot saturate modern multi-core machines without running multiple instances behind a load balancer. It does not support protocol-level features like prepared statements in transaction mode. Its configuration is file-based with SIGHUP reloads — no API, no dynamic reconfiguration. And its monitoring is limited to a SHOW STATS command over a pseudo-database connection.

Pooler	Language	Threading	Prepared statements in txn mode	Best for
PgBouncer	C	Single-threaded	No	Simple setups, low-overhead sidecar
pgcat	Rust	Multi-threaded (async Tokio)	Yes (partial)	High-throughput proxies, sharding
Supavisor	Elixir/BEAM	BEAM lightweight processes	Yes	Serverless, massive client counts
Odyssey	C	Multi-threaded	Yes	High-perf alternative to PgBouncer
Neon Proxy	Rust	Multi-threaded (async)	N/A (HTTP/WebSocket)	Serverless-native Neon workloads

The trend is clear: the next generation of poolers is multi-threaded, supports prepared statements in transaction mode, and integrates with serverless deployment models. PgBouncer remains the safe default if you are running traditional infrastructure. If you are deploying to serverless or need to support 10K+ concurrent connections, look at pgcat, Supavisor, or your database provider’s native pooling.

···

When should you avoid connection pooling entirely?

Pooling is not always the right answer. There are workloads where a direct connection is better:

Long-running analytics queries. If a query runs for 30 minutes, it is holding a pooled connection for 30 minutes. You have gained nothing from pooling and you are starving other clients. Run analytics on a read replica with a direct connection.

LISTEN/NOTIFY consumers. As discussed, pub/sub requires persistent session-level state. A pooled connection will silently drop notifications. Use a direct connection for notification listeners.

Connection-aware extensions. Logical replication, pg_stat_statements per-connection tracking, and some audit logging extensions depend on connection identity. Pooling obscures this by rotating backend connections.

Single-tenant, low-connection applications. If your monolith opens 10 connections and keeps them for the lifetime of the process, a connection pooler adds complexity with zero benefit. The pool inside your ORM’s driver (database/sql in Go, HikariCP in Java) is sufficient.

···

What does the production checklist look like?

Connection pooling production checklist

Set pool_mode = transaction unless you need session-level features
Size backend pools using Little’s Law: L = λ × W + 50% headroom
Set query_timeout to kill runaway queries before they exhaust the pool
Set server_idle_timeout to reclaim connections from leaked transactions
Set server_reset_query = DISCARD ALL to prevent session state leakage
Disable prepared statements in your ORM if using transaction mode
Test pooler behavior during database failover, not just normal operation
Monitor pool utilization — alert at 80% before you hit 100%
Set dns_max_ttl appropriately for your failover requirements
Run PgBouncer behind a health check if centralized, so load balancers can route around failures

Connection pooling is one of those infrastructure components that disappears when it works and dominates your incident channel when it does not. The difference is never the technology — it is whether someone on the team understood the pooling mode, sized the pool with math instead of vibes, and tested the failure modes before production did it for them.

Build with us

Need this kind of thinking applied to your product?

We build AI agents, full-stack platforms, and engineering systems. Same depth, applied to your problem.

Start a conversation View services

Newsletter

Enjoyed this? Get the weekly digest.

Research highlights and AI news, delivered every Thursday. No spam.

Loading comments...

Keep Reading

All articles