The Sev-1 Database

The Sev-1 Database

[Premium] PostgreSQL High CPU Investigation: The Complete DBA Playbook 11+ Queries

How to Diagnose, Fix, and Prevent CPU Spikes from Crippling Your Database

Haider Z's avatar
Haider Z
Jan 19, 2026
∙ Paid

Table of Contents

  1. The Scene: Black Friday Meltdown

    • The Business Impact: a crisis

  2. The Investigation: What We Found

  3. Why PostgreSQL CPU Spikes Happen

  4. Why I’m Sharing This

  5. 11 Production-Tested Queries for CPU Investigation

    • Query 1: Active Queries Ranked by Runtime (Run First)

    • Query 2: Parallel Workers Detection (CPU Multiplier)

    • Query 3: Top CPU-Consuming Queries (Historical Analysis)

    • Query 4: Sequential Scans on Large Tables (Index Missing Detection)

    • Query 5: Unused and Duplicate Indexes (Index Bloat)

    • Query 6: Missing Index Recommendations

    • Query 7: Query Execution Plan Analysis (EXPLAIN)

    • Query 8: Stale Statistics Detection

    • Query 9: Function Calls Preventing Index Usage

    • Query 10: CPU Health Summary (Dashboard Query)

    • Query 11: Real-Time CPU Consumer Identification

  6. Force Statistics Refresh All Tables

  7. Prevention Engineering Checklist

    • Visibility & Monitoring

    • Query Safety

    • Database Health & Optimization


Share The Sev-1 Database


The Scene: Black Friday Meltdown

11:47 AM Peak Shopping Hour

The e-commerce platform is handling 3x normal traffic. Revenue is flowing at $47,000 per minute. Then the alerts starts coming:

  • CRITICAL: CPU utilization 98% for 3 minutes

  • CRITICAL: API response time > 8 seconds

  • CRITICAL: Database connection pool exhausted

The on-call engineer opens the database dashboards. The CPU graph is a flat line at 100%. It’s been pinned there for 4 minutes. The application dashboard shows checkout success rate has dropped from 99.2% to 12%.

The Business Impact: a crisis

During this 23-minute incident, the lack of PostgreSQL performance tuning led to catastrophic results:

  • Checkout Success Rate: 99.2% before vs. 12% during (87% drop).

  • API Response Time: 180ms before vs. 8,400ms during (46x slower).


The Sev-1 Database is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.


The Investigation: What We Found

Within 90 seconds, the team identified the culprit: A single analytics query consuming CPU cores.

A quick dashboard fix joined five tables without proper indexes and used ORDER BY on 87 million rows. PostgreSQL’s parallel query CPU explosion occurred as the executor spawned workers each pinning a core.

During normal traffic, this query ran once per hour and took 8 seconds. During Black Friday traffic, it was being triggered 200 times per minute by an auto-refresh dashboard widget.

200 queries × 14 parallel workers × 8 seconds = Complete CPU saturation

The checkout API queries were slowed down. They queued behind the analytics, timing out one by one.


Why PostgreSQL CPU Spikes Happen

PostgreSQL is a single-process-per-connection database. High CPU means backends are executing expensive operations without waiting on I/O or locks.

The CPU Consumption Hierarchy

If your database is hitting 90%+ CPU usage, it is likely due to one of these six culprits, ranked from most common to least:

  1. Sequential Scans on Large Tables Reading every single row from disk/memory because an index is missing or ignored.

  2. Sort Operations Without Index Support Heavy ORDER BY clauses on non-indexed columns that force the database to sort data in work_mem.

  3. Hash Joins and Aggregates Complex GROUP BY and JOIN operations building massive in-memory hash tables.

  4. Parallel Query Explosion A single query spawning too many workers, where each worker consumes a full CPU core.

  5. Stale Statistics The Query Planner chooses a Plan B strategy because it thinks a table has 100 rows when it actually has 10 million.

  6. Function Calls in WHERE Clauses Using functions like LOWER prevents index usage and forces a row-by-row evaluation (Index Suppression).


Why I’m Sharing This

I’ve been in over 1,000+ critical outage war rooms at Microsoft.

Database down. Revenue bleeding. Executives on the bridge call. Everyone panicking.

These moments taught me something the hard way: most of these outages were preventable.

If teams had the right queries ready, if they could detect issues before they exploded, if they had engineered their monitoring correctly they wouldn’t be in that war room at all.

The 11+ queries in this playbook are the exact queries I run in those war rooms.

They’re what I use to cut through the chaos and find the root cause in 90 seconds instead of 90 minutes. I’m sharing them because I believe every DBA and DBRE or SRE or engineer deserves access to battle-tested tools, not just the ones who’ve been through the fire.

But here’s the thing: High CPU is just one scenario.


Production databases face dozens of failure modes connection exhaustion, replication lag, bloat, vacuum stalls, lock chains, memory pressure, and more. Each one needs its own investigation playbook.

That’s why I created two options for you:

Option 1: The Complete PostgreSQL Health Check Toolkit One-Time Purchase ($29)

I invested time building what I wish I had during those 1,000+ outages: a comprehensive Health Check Toolkit with 60+ production-tested queries packaged as stored functions.

Install once. Query anytime. Know your critical issues instantly.

Why $29?

It’s not about the price it’s about the time you’ll never get back during a outage. It’s about the stress you won’t feel when you can diagnose in seconds instead of hours. It’s about having the answers ready before the CTO joins the call.

Every time I update the toolkit or add new queries, you’ll be notified. One payment, lifetime updates. More queries will be added, and according to the value price will increase.

👉 [Buy the Updated PostgreSQL Health Report]

Option 2: Become a Premium Member of The Sev1 Database Newsletter

$8/month

The toolkit provides the queries, while the newsletter shares real-world experiences and lessons learned.

Premium membership includes:

- Access to the full High CPU playbook and over 11 Sev1 investigation playbooks

- Detailed real-world war room scenarios with insights beyond standard documentation

- Monthly updates with new playbooks based on recent incidents

- Direct support for you via email

This is designed for those who want to learn both what actions to take and how to approach challenging situations.

Subscribe to The Sev1 Database Newsletter →


11 Production-Tested Queries for CPU Investigation

Essential for any PostgreSQL high CPU fix.

The following queries are ordered by priority. Run them in sequence during a CPU spike investigation.

Prerequisite: Queries 3, 6, and 9 require the pg_stat_statements extension. To enable:

-- Check if already enabled

SELECT * FROM pg_available_extensions WHERE name = ‘pg_stat_statements’;

-- Enable (requires shared_preload_libraries to include pg_stat_statements)

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

1. Active Queries Ranked by Runtime (Run First)

Purpose: Identify which active connections are consuming CPU right now. Long-running queries are your prime suspects.

Version: PostgreSQL 14+

Scope: ALL active backends on the server

User's avatar

Continue reading this post for free, courtesy of Haider Z.

Or purchase a paid subscription.
© 2026 Haider Z · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture