[FREE] PostgreSQL WAL Setting Nobody Changes Until It Causes Data Loss
Most engineers set wal_level = replica and never touch it again. Here’s what the documentation actually says and why it matters.
PostgreSQL WAL Setting Nobody Changes Until It Causes Data Loss
Table of Contents
What WAL Actually Is (And Why It’s Not Just for Replication)
Three Levels What the Docs Actually Say
Incident Pattern I’ve Seen Three Times
Other Side: WAL Level and Disk Space
Queries to Check Your Current WAL Configuration
What You Should Do Today
📨 If this was useful, share it with one engineer on your team who manages production databases.
There’s a configuration parameter sitting quietly in your postgresql.conf right now.
You probably haven’t touched it since the database was provisioned. Your monitoring doesn’t alert on it. And if you’re running a default installation, it’s almost certainly set to a value that could cost you during your next incident.
It’s wal_level.
And this is the story of why it matters more than most engineers realize.
What WAL Actually Is (And Why It’s Not Just for Replication)
Write-Ahead Log is PostgreSQL’s durability backbone. Before any data change hits the actual data files, PostgreSQL writes a record of that change to the WAL. If the server crashes mid-write, PostgreSQL replays the WAL on restart and recovers to a consistent state.
That’s the textbook definition. Here’s what most people miss:
WAL isn’t just about crash recovery. It’s about everything.
Point-in-time recovery (PITR)
Logical replication
Physical standby servers
Third-party CDC tools (Debezium, pglogical)
Backup tools like pgBackRest and Barman
All of them depend on wal_level being configured correctly. And if it’s wrong, some of these silently stop working with no error, no alert, and no warning until you need them.
Three Levels What the Docs Actually Say
PostgreSQL gives you three options:
minimal
The least WAL possible. Enough for crash recovery only. Some bulk operations skip WAL entirely in this mode. You cannot stream to a standby. You cannot do PITR. Almost no place in production.
replica (the default)
Enough WAL for physical replication and base backups. Supports streaming replication, pg_basebackup, and pgBackRest.
What it doesn’t support: logical decoding, CDC tools like Debezium, or slots that inspect row-level change data.
logical
Everything in replica, plus the information needed for logical decoding. Required for logical replication slots, CDC pipelines, or tools that read WAL to reconstruct row-level changes.
Incident Pattern I’ve Seen Three Times
Here’s the scenario that plays out, usually after a migration or a quick infrastructure change:
Team decides to set up a CDC pipeline to stream changes to Kafka
They install Debezium, point it at the database
Debezium starts, connects, creates a replication slot
Everything looks fine for 48 hours
Debezium silently falls behind or never actually streams row-level data at all
Team investigates. Slot is there. Connection is there. But wal_level = replica
Tool connected successfully. It created the slot successfully. PostgreSQL never complained. But without wal_level = logical, the WAL simply doesn’t contain the row-level change records the tool needs.
You only discover this when you check the data downstream and realize it’s hours or days stale.
🚀 PostgreSQL Health Report — One SQL file. 60+ diagnostic checks. Detects vacuum lag, bloat, lock chains, replication slot danger, and more. Used by DBAs managing production at scale.
Other Side: WAL Level and Disk Space
Changing wal_level from replica to logical increases WAL volume. For write-heavy tables expect 20–40% more WAL generation. That means more disk, higher I/O, more replication bandwidth, and longer recovery times if WAL accumulates faster than you ship it.
This is the tradeoff nobody documents in the migration guide: you enabled a feature, and your disk started filling up faster than it did last month.
Queries to Check Your Current WAL Configuration
Run this on any production database today:
-- Check WAL configuration
SELECT name, setting, unit, short_desc
FROM pg_settings
WHERE name IN (
'wal_level','wal_keep_size','max_wal_size',
'min_wal_size','wal_compression','archive_mode','archive_command'
)
ORDER BY name;And check replication slots for lag:
-- Replication slot health
SELECT
slot_name, slot_type, active,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag_size,
restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots
ORDER BY pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) DESC;If any slot shows lag in gigabytes that’s your next Sev-1 in waiting.
What You Should Do Today
Run the queries above on every production database
If you have CDC tools, Debezium, or logical replication verify
wal_level = logicalIf you recently changed
wal_levelcheck your disk usage trend for the past 30 daysIf you have replication slots confirm every slot has an active consumer. An inactive slot holds WAL on disk indefinitely
The configuration change requires a PostgreSQL restart. Plan accordingly.
Next Wednesday: the full diagnostic SQL pack for WAL pressure, slot lag, and archive failures the exact queries I run when WAL is suspected during a Sev-1.
📧 Want the complete PostgreSQL Health Report?
Includes MultiXact checks, vacuum monitoring, checkpoint diagnostics, WAL slot health, and 60+ diagnostic queries all in one SQL file.
📌 Upgrade to Premium ($8/month) — Premium subscribers get the full SQL diagnostic packs, step-by-step playbooks, and the queries I actually run during a Sev-1.


