The Sev-1 Database

The Sev-1 Database

[Premium] PostgreSQL OOM Crash: 11 Diagnostic Queries

How to Diagnose, Fix, and Prevent Linux OOM Killer from Terminating Your Database

Haider Z's avatar
Haider Z
Jan 16, 2026
∙ Paid

Table of Contents

  1. The Incident: Understanding PostgreSQL OOM Crashes

  2. Why PostgreSQL OOM Crashes Happen

  3. Top 11 Diagnostic Queries

    • Query 1: Active Backends by Runtime

    • Query 2: Your Session Memory Breakdown

    • Query 3: Parallel Query Workers Detection

    • Query 4: Current Parallel Execution Configuration

    • Query 5: Idle Connections by State

    • Query 6: Individual Long-Idle Connections

    • Query 7: Calculate Safe work_mem

    • Query 8: High-Memory Query Patterns

    • Query 9: Memory Health Summary

    • Query 10: Connections by Application

    • Query 11: Current Memory Configuration

  4. Configuration Reference

    • OOM Prevention Settings

    • Role-Based work_mem Strategy

    • Session-Level Override

  5. Core Principles


Get production‑grade database insights
Real incident breakdowns, tuning playbooks, and hard‑earned lessons from running databases at scale.

👉 Subscribe to get new posts directly in your inbox

Share The Sev-1 Database

👉 [The PostgreSQL Health Check That Prevents Sev-1s] Price: $29
60+ health indicators. One function call. No agents, no subscriptions, no data leaving your infrastructure


The Incident: Understanding PostgreSQL OOM Crashes

Your monitoring shows a sudden drop in connections followed by a service restart. The system log reveals:

Out of memory: Kill process 12345 (postgres) score 950

The Linux kernel terminated your database because PostgreSQL attempted to consume more RAM than available. This triggers crash recovery, causing 30 seconds to 20+ minutes of complete unavailability depending on WAL volume.

Important context: PostgreSQL memory consumption is dynamic and non-deterministic. The queries in this post rank risk and identify suspects. All memory figures represent PostgreSQL internal allocations, not OS-level RSS.


Why PostgreSQL OOM Crashes Happen

The primary cause is misconfigured work_mem combined with parallel query execution.

The work_mem multiplier effect:

  • work_mem allocates per executor node, not globally

  • A single query with 3 hash joins, 2 sorts, and 4 parallel workers consumes:
    work_mem x 5 operations x 4 workers = 20x work_mem

  • Linux memory overcommit allows allocation promises until physical RAM runs out

  • The OOM killer terminates the process with the highest memory badness score


Query 1: Active Backends by Runtime (Run First)

Purpose: Identify which active connections are running the longest and may be consuming memory.

Version: PostgreSQL 14+

Scope: ALL active backends on the server


The Sev-1 Database is a reader-supported publication. To receive new posts and support my work, consider becoming a paid subscriber 8$/month.

User's avatar

Continue reading this post for free, courtesy of Haider Z.

Or purchase a paid subscription.
© 2026 Haider Z · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture