How OpenAI Scaled Postgres to Millions of QPS

A behind-the-scenes breakdown of OpenAI’s Postgres scaling strategy, PostgreSQL optimizations, global read replicas, and the engineering discipline that made unsharded Postgres run at AI scale.

Nov 21, 2025

Dark-themed graphic with text: ‘Not MySQL. Not MongoDB. The Database Behind ChatGPT Will Shock You.’ A large neon green question mark overlays a faint PostgreSQL elephant outline. Subtitle reads ‘OpenAI scaled Postgres to millions of QPS.’ Footer includes LinkedIn branding and author credit: @HaiderZah, Author of The Sev-1 Database.

The Unexpected Database Powering ChatGPT

When people imagine ChatGPT’s backend, they picture a massive, distributed, AI-optimized data system not a traditional relational database.

But the truth is more surprising:

OpenAI relies heavily on PostgreSQL.

Not sharded.
Not partitioned across clusters.
Just a single primary with a carefully engineered architecture running on Azure Database for PostgreSQL.
And yet… it handles millions of QPS (queries per second).

This story isn’t about replacing Postgres it’s about engineering excellence that turned standard PostgreSQL into a high-QPS Postgres powerhouse capable of supporting modern AI workloads.

Share The Sev-1 Database

Why This Architecture Shocked the Engineering World

PostgreSQL is known for reliability, but not for extreme scale.

Especially not:

heavy read traffic
global replicas
MVCC vacuum overhead
Postgres write bottlenecks
high WAL generation
avoiding sharding

Yet OpenAI built an unsharded Postgres architecture that powers ChatGPT.

This required solving classic Postgres problems like MVCC bloat, autovacuum tuning, replica lag, connection limits, and inefficient ORM-generated queries.

The result?

A breakthrough in Postgres scaling without a complete rewrite.

Not subscribed yet? Subscribe now and don’t miss my next post!

OpenAI’s Postgres Scaling Blueprint

1. Control the Postgres Write Bottleneck (The #1 Scaling Limiter)

Postgres struggles under heavy writes because of MVCC version churn, index maintenance, and vacuum overhead.

OpenAI tackled this by dramatically reducing writes:

Migrating write-heavy and shardable workloads off PostgreSQL
Fixing application bugs that caused unnecessary writes
Rate limiting backfills to prevent WAL spikes
Using lazy writes to avoid CPU bursts
Reducing write amplification from ORM behavior

By crushing the Postgres write bottleneck, the team unlocked room for massive read scalability.

2. Turn PostgreSQL Into a Read-Replica Machine

OpenAI’s workloads are read-heavy perfect for Postgres scaling.

They built a global network of read replicas using Azure Database for PostgreSQL.

This architecture delivered:

high-QPS Postgres performance
low replication lag
geo-distributed low-latency reads
massive read scalability
replica routing for critical requests

Only essential transactional reads ever hit the primary.

Everything else is routed to replicas, enabling scaling Postgres without sharding.

Want the Deep Stuff? Become a Paid Member

Paid members get instant access to:

Premium content with Deep dive into full RCA analysis, get actionable scripts, and Ask Me Anything via Email for Mentorship.

3. Query Optimization: Solving the Real Performance Killers

OpenAI solved this with strict Postgres query optimization rules:

Enforcing idle_in_transaction_session_timeout
Enforcing strict statement_timeout
Adding client-side timeouts
Detecting multi-way joins and moving them to the application layer
Rate limiting pricey query digests
Avoiding ORM anti-patterns that generate slow SQL
Monitoring pg_stat_activity for long-running queries that can block autovacuum and consume resources
Avoid expensive multi-way joins by handling joins at the application level

This eliminated classic Postgres multi-way join problems and stabilized the database.

4. Handling the Single Point of Failure in PostgreSQL

In OpenAI’s PostgreSQL architecture, the system runs on a single primary writer, which introduces a classic single point of failure. If the primary goes down, no writes can occur.

However, OpenAI engineered around this limitation through replica-based resilience:

Many read replicas = no read outage

If one read replica fails, applications simply route traffic to another.
Most critical requests are read-only, so even if the primary fails, read replicas can still serve SEV2-level traffic.

Prioritizing traffic by impact

To avoid high-impact outages:

Requests are categorized as High Priority (SEV0) vs Low Priority (SEV2)
High-priority APIs get dedicated read replicas
Low-priority traffic never competes for critical replica capacity

This architecture minimizes real-world user impact even when failures occur.

5. Rate Limiting to Protect the Primary

A single expensive query can destroy a PostgreSQL instance and it happened.

OpenAI experienced a production incident where a 12-way join spiked unexpectedly and took down the entire primary.

To prevent ANY one query from overwhelming the system, they implemented multi-layer rate limiting:

Rate limit application-level functions

During peak load, write-heavy or complex read operations are automatically slowed down.

Rate limit new database connections

Prevents connection pool exhaustion one of the fastest ways to crash PostgreSQL.

Rate limit specific query digests

If a particular SQL pattern is too heavy, its throughput is capped to protect overall system health.

This is how OpenAI avoids catastrophic query storms and ensures stable high-QPS PostgreSQL performance.

6. Connection Pooling with PgBouncer

PgBouncer is the backbone of the entire scaling strategy.

In Azure Database for PostgreSQL, the primary has a 5,000 connection limit. Without pooling, Postgres would collapse under connection storms.

PgBouncer solves this by:

Reusing connections instead of creating new ones

This reduces connection latency from ~50ms to ~5ms.

Keeping connection count low

Pooling ensures the primary never hits its max connections.

Auto-routing traffic when replicas fail

If a replica goes offline, PgBouncer seamlessly redirects traffic to healthy replicas.

PgBouncer is what turns OpenAI’s architecture from fragile → scalable.

7. Schema Management With Zero Risk

Schema changes are dangerous at scale, especially on large tables they can cause blocking, table rewrites, and long outages.

OpenAI implemented strict Postgres schema governance:

What’s NOT allowed

No new tables
No new workloads added to PostgreSQL
No schema operations that require table rewrites

What IS allowed

Add or remove columns (with a strict 5-second timeout)
Add or drop indexes concurrently

Preventing schema changes from being blocked

Schema migrations fail if long-running queries (>1s) constantly touch the target table.

Engineers must:

Fix inefficient queries
Or move those queries to read replicas
Or track them using:

SELECT * FROM pg_stat_activity
WHERE query LIKE ‘%table_name%’
AND now() - query_start > interval ‘1 second’;

This strict discipline ensures that PostgreSQL remains nimble and operational even under high-QPS AI workloads.

The End Result: PostgreSQL Running at AI Scale

With all these optimizations, OpenAI’s Postgres architecture achieved something unprecedented:

PostgreSQL scaled to millions of QPS, powering critical ChatGPT workloads with dozens of global read replicas and minimal replication lag, showcasing the robustness of OpenAI database infrastructure.

Additional results:

Only one SEV-0 incident in nine months
Stable, reliable, low-latency performance
Enough capacity for future growth
Stress-free scaling without introducing sharding complexity

This is the definition of a high-QPS ChatGPT Postgres backend and a model example of OpenAI Postgres architecture in action.

The Final Lesson

OpenAI didn’t scale by adding a new distributed database.
They scaled by deeply understanding Postgres.

They mastered:

Postgres scaling limits
MVCC vacuum behavior
connection pooling
rate limiting
read replica architecture
schema management discipline
PostgreSQL performance tuning

Success didn’t come from switching databases.
It came from engineering discipline.

And that discipline is what keeps ChatGPT online for millions of users.

What’s your biggest challenge when scaling PostgreSQL? Write bottlenecks, replica lag, or query storms? Share your thoughts I’d love to feature your insights in the next issue.

🔔 Subscribe for Weekly PostgreSQL War Stories

💼 Digital Products

PostgreSQL Health Check & Performance Toolkit – Catch hidden database risks before they become outages with run_health_report(), a single query covering 60+ health checks and automated alerts. Lifetime updates included install once, monitor forever.
Price: 29$
Get the Toolkit
Critical Mindset: Manage High-Severity Incidents – Step-by-step strategies for staying calm and resolving major technical emergencies.
Price: 39$
Get the Guide

👉 Subscribe to stay ahead of the next outage

Reference:

POSETTE (Bohan)

The Sev-1 Database

Discussion about this post

Ready for more?