How OpenAI Scaled Postgres to Millions of QPS
A behind-the-scenes breakdown of OpenAI’s Postgres scaling strategy, PostgreSQL optimizations, global read replicas, and the engineering discipline that made unsharded Postgres run at AI scale.
The Unexpected Database Powering ChatGPT
When people imagine ChatGPT’s backend, they picture a massive, distributed, AI-optimized data system not a traditional relational database.
But the truth is more surprising:
OpenAI relies heavily on PostgreSQL.
Not sharded.
Not partitioned across clusters.
Just a single primary with a carefully engineered architecture running on Azure Database for PostgreSQL.
And yet… it handles millions of QPS (queries per second).
This story isn’t about replacing Postgres it’s about engineering excellence that turned standard PostgreSQL into a high-QPS Postgres powerhouse capable of supporting modern AI workloads.
Why This Architecture Shocked the Engineering World
PostgreSQL is known for reliability, but not for extreme scale.
Especially not:
heavy read traffic
global replicas
MVCC vacuum overhead
Postgres write bottlenecks
high WAL generation
avoiding sharding
Yet OpenAI built an unsharded Postgres architecture that powers ChatGPT.
This required solving classic Postgres problems like MVCC bloat, autovacuum tuning, replica lag, connection limits, and inefficient ORM-generated queries.
The result?
A breakthrough in Postgres scaling without a complete rewrite.
Not subscribed yet? Subscribe now and don’t miss my next post!
OpenAI’s Postgres Scaling Blueprint
1. Control the Postgres Write Bottleneck (The #1 Scaling Limiter)
Postgres struggles under heavy writes because of MVCC version churn, index maintenance, and vacuum overhead.
OpenAI tackled this by dramatically reducing writes:
Migrating write-heavy and shardable workloads off PostgreSQL
Fixing application bugs that caused unnecessary writes
Rate limiting backfills to prevent WAL spikes
Using lazy writes to avoid CPU bursts
Reducing write amplification from ORM behavior
By crushing the Postgres write bottleneck, the team unlocked room for massive read scalability.
2. Turn PostgreSQL Into a Read-Replica Machine
OpenAI’s workloads are read-heavy perfect for Postgres scaling.
They built a global network of read replicas using Azure Database for PostgreSQL.
This architecture delivered:
high-QPS Postgres performance
low replication lag
geo-distributed low-latency reads
massive read scalability
replica routing for critical requests
Only essential transactional reads ever hit the primary.
Everything else is routed to replicas, enabling scaling Postgres without sharding.
Want the Deep Stuff? Become a Paid Member
Paid members get instant access to:
Premium content with Deep dive into full RCA analysis, get actionable scripts, and Ask Me Anything via Email for Mentorship.
3. Query Optimization: Solving the Real Performance Killers
OpenAI solved this with strict Postgres query optimization rules:
Enforcing
idle_in_transaction_session_timeoutEnforcing strict
statement_timeoutAdding client-side timeouts
Detecting multi-way joins and moving them to the application layer
Rate limiting pricey query digests
Avoiding ORM anti-patterns that generate slow SQL
Monitoring
pg_stat_activityfor long-running queries that can block autovacuum and consume resourcesAvoid expensive multi-way joins by handling joins at the application level
This eliminated classic Postgres multi-way join problems and stabilized the database.
4. Handling the Single Point of Failure in PostgreSQL
In OpenAI’s PostgreSQL architecture, the system runs on a single primary writer, which introduces a classic single point of failure. If the primary goes down, no writes can occur.
However, OpenAI engineered around this limitation through replica-based resilience:
Many read replicas = no read outage
If one read replica fails, applications simply route traffic to another.
Most critical requests are read-only, so even if the primary fails, read replicas can still serve SEV2-level traffic.
Prioritizing traffic by impact
To avoid high-impact outages:
Requests are categorized as High Priority (SEV0) vs Low Priority (SEV2)
High-priority APIs get dedicated read replicas
Low-priority traffic never competes for critical replica capacity
This architecture minimizes real-world user impact even when failures occur.
5. Rate Limiting to Protect the Primary
A single expensive query can destroy a PostgreSQL instance and it happened.
OpenAI experienced a production incident where a 12-way join spiked unexpectedly and took down the entire primary.
To prevent ANY one query from overwhelming the system, they implemented multi-layer rate limiting:
Rate limit application-level functions
During peak load, write-heavy or complex read operations are automatically slowed down.
Rate limit new database connections
Prevents connection pool exhaustion one of the fastest ways to crash PostgreSQL.
Rate limit specific query digests
If a particular SQL pattern is too heavy, its throughput is capped to protect overall system health.
This is how OpenAI avoids catastrophic query storms and ensures stable high-QPS PostgreSQL performance.
6. Connection Pooling with PgBouncer
PgBouncer is the backbone of the entire scaling strategy.
In Azure Database for PostgreSQL, the primary has a 5,000 connection limit. Without pooling, Postgres would collapse under connection storms.
PgBouncer solves this by:
Reusing connections instead of creating new ones
This reduces connection latency from ~50ms to ~5ms.
Keeping connection count low
Pooling ensures the primary never hits its max connections.
Auto-routing traffic when replicas fail
If a replica goes offline, PgBouncer seamlessly redirects traffic to healthy replicas.
PgBouncer is what turns OpenAI’s architecture from fragile → scalable.
7. Schema Management With Zero Risk
Schema changes are dangerous at scale, especially on large tables they can cause blocking, table rewrites, and long outages.
OpenAI implemented strict Postgres schema governance:
What’s NOT allowed
No new tables
No new workloads added to PostgreSQL
No schema operations that require table rewrites
What IS allowed
Add or remove columns (with a strict 5-second timeout)
Add or drop indexes concurrently
Preventing schema changes from being blocked
Schema migrations fail if long-running queries (>1s) constantly touch the target table.
Engineers must:
Fix inefficient queries
Or move those queries to read replicas
Or track them using:
SELECT * FROM pg_stat_activity
WHERE query LIKE ‘%table_name%’
AND now() - query_start > interval ‘1 second’;This strict discipline ensures that PostgreSQL remains nimble and operational even under high-QPS AI workloads.
The End Result: PostgreSQL Running at AI Scale
With all these optimizations, OpenAI’s Postgres architecture achieved something unprecedented:
PostgreSQL scaled to millions of QPS, powering critical ChatGPT workloads with dozens of global read replicas and minimal replication lag, showcasing the robustness of OpenAI database infrastructure.
Additional results:
Only one SEV-0 incident in nine months
Stable, reliable, low-latency performance
Enough capacity for future growth
Stress-free scaling without introducing sharding complexity
This is the definition of a high-QPS ChatGPT Postgres backend and a model example of OpenAI Postgres architecture in action.
The Final Lesson
OpenAI didn’t scale by adding a new distributed database.
They scaled by deeply understanding Postgres.
They mastered:
Postgres scaling limits
MVCC vacuum behavior
connection pooling
rate limiting
read replica architecture
schema management discipline
PostgreSQL performance tuning
Success didn’t come from switching databases.
It came from engineering discipline.
And that discipline is what keeps ChatGPT online for millions of users.
What’s your biggest challenge when scaling PostgreSQL? Write bottlenecks, replica lag, or query storms? Share your thoughts I’d love to feature your insights in the next issue.
🔔 Subscribe for Weekly PostgreSQL War Stories
💼 Digital Products
PostgreSQL Health Check & Performance Toolkit – Catch hidden database risks before they become outages with
run_health_report(), a single query covering 60+ health checks and automated alerts. Lifetime updates included install once, monitor forever.Price: 29$
Get the ToolkitCritical Mindset: Manage High-Severity Incidents – Step-by-step strategies for staying calm and resolving major technical emergencies.
Price: 39$
Get the Guide
👉 Subscribe to stay ahead of the next outage
Reference:


