[FREE] PostgreSQL Migration You Need to Read Before Moving to the Cloud
What nobody tells you about moving your PostgreSQL database off-prem and the one setting that turned a disaster into a success.
There’s a moment in every cloud migration where someone on the team says, It’s slower.
Not a little slower. Not we’re still optimizing. Just... slower. The kind of slower that makes a room go quiet.
This is the story of that moment and the surprisingly simple fix that saved the project.
Table of Contents
Setup
7 Seconds → 20 Seconds
First, Understand: the Architecture Is Fundamentally Different
What Is effective_cache_size in PostgreSQL?
Discovery
Fix
Why This Happens More Than You Think
Checklist You Didn’t Know You Needed
Before Migration
After Migration
Mindset Shift
Bigger Lesson
Setup
A customer came to us with a pretty standard migration story. They were running PostgreSQL on-premises, everything humming along nicely. Queries were fast. Users were happy. The database was doing what databases do quietly working in the background while everyone else took the credit.
Then came the decision to move to Azure.
The reasons were the usual ones: reduce infrastructure overhead, improve scalability, stop worrying about hardware refreshes. All good reasons. All the right reasons.
The migration itself went smoothly. Schema migrated. Data migrated. Application reconnected. Green lights across the board.
And then they ran the first query.
7 seconds → 20 Seconds
Read that again.
A query that completed in 7 to 8 seconds on their on-prem server was now taking 20 seconds in the cloud.
That’s not a performance regression. That’s a different universe. That’s the difference between a user clicking a button and seeing a result, and a user clicking a button, making coffee, and coming back to check.
The immediate reaction was predictable: The cloud is slow.
But the cloud wasn’t slow. The cloud was doing exactly what it was told to do. The problem was what it was told.
First, Understand: the Architecture Is Fundamentally Different
Before I get into the fix, I need to talk about something that most people gloss over in migration planning and it’s the reason small configuration mistakes turn into massive performance problems in the cloud.
On-prem and cloud PostgreSQL do not run on the same architecture. Not even close.
On your on-prem server, everything is local. Your CPU, your RAM, your SSD they’re all on the same physical machine, connected by a fast internal bus. When PostgreSQL needs to read a data page from disk, it’s talking to a drive that’s physically inches away.
The latency is measured in microseconds. The OS file cache sits in the same RAM bank. Everything is close, everything is fast, and everything is predictable.
In the cloud, that picture changes completely:
Storage is network-attached. Your database files don’t live on a local disk. They live on a remote storage service Azure Managed Disks, Amazon EBS, etc. connected over a network. Even premium SSDs in the cloud add a layer of network latency that doesn’t exist on-prem. A single random read that took 0.1ms on a local NVMe might take 1-2ms through the network storage layer. That doesn’t sound like much until the query planner decides to do 10,000 of them.
CPU and memory are virtualized. You’re running on a hypervisor, sharing physical hardware with other tenants. The performance characteristics of your 16 GB RAM in the cloud are not identical to 16 GB of DDR4 on a bare-metal box. Memory access patterns, NUMA topology, CPU cache behavior all slightly different.
I/O throughput has different constraints. On-prem, your I/O ceiling was determined by your physical hardware. In the cloud, it’s determined by your SKU. You get a certain number of IOPS and throughput based on what you’re paying for. Hit that ceiling, and you’re throttled something that simply didn’t happen on your on-prem SSD array.
Network latency between components matters. If your application server and database were on the same rack on-prem (or even the same machine), you had sub-millisecond network latency. In the cloud, even within the same region, you’re looking at 1-2ms per round trip. For a query that makes one call, that’s nothing. For an application making hundreds of small queries in sequence, it adds up fast.
Here’s why this matters for our story: when the architecture changes this fundamentally, PostgreSQL’s query planner needs to know about it. The planner doesn’t detect hardware. It doesn’t benchmark your storage. It relies entirely on the parameters you give it to model the cost of different execution strategies.
If those parameters still describe your old on-prem environment, the planner is optimizing for a machine that no longer exists. And in an architecture where bad decisions are punished more harshly because every unnecessary disk read costs more, every sequential scan hits network storage instead of local disk those bad decisions compound.
That’s exactly what happened here.
What Is effective_cache_size in PostgreSQL?
PostgreSQL’s query planner is smart. Really smart. But it’s not psychic. It makes decisions about how to execute your query based on a set of assumptions, assumptions that are configured through server parameters.
One of the most critical of these is effective_cache_size.
effective_cache_size is a PostgreSQL server parameter that tells the query planner how much memory it can expect to be available for caching data combining PostgreSQL’s own shared buffers and the operating system’s file cache. It doesn’t allocate memory.
It doesn’t change how much RAM PostgreSQL actually uses. It’s an estimate. A hint. A whisper to the planner that says
Hey, this much data is probably already in memory, so you can plan accordingly.
When this value is set too low, the planner gets conservative. It assumes disk reads are expensive (because it thinks nothing is cached). It avoids index scans. It favors sequential scans. It makes safe, slow choices.
And in the cloud where a sequential scan means pulling data across the network from remote storage instead of reading from a local drive “safe and slow” becomes “catastrophically slow.”
Discovery
The customer’s server had 16 GB of RAM. When I looked at their effective_cache_size, it was set to 3 GB.
Three gigabytes. Out of sixteen.
On their on-prem box, this still worked. Why? Because the architecture was forgiving. The local SSD was fast. The OS file cache was warm and sitting on the same physical memory bus.
Even when the planner made a suboptimal choice say, a sequential scan instead of an index scan the local hardware ate the cost without breaking a sweat. The 7s query was 7s in spite of the misconfiguration, not because the configuration was correct.
But in the cloud, that forgiveness disappears. The planner looked at that 3 GB effective_cache_size, assumed most data wasn’t cached, and chose the safe path: sequential scans through network-attached storage. On-prem, that was a 7s shrug. In the cloud, that became a 20-second catastrophe.
The planner wasn’t broken. It was making rational decisions based on bad information about an architecture it didn’t understand.
Fix
I asked them to change effective_cache_size from 3 GB to 12 GB.
That’s it. One parameter. One number.
Then we ran ANALYZE on the tables to make sure the planner also had fresh statistics about data distribution. (Because if you’re going to give the planner better information about memory, you might as well give it better information about everything.)
The first run after the change? Faster. Second run? Faster still. By the third or fourth execution once the cache was warm and the planner was making informed decisions we were back.
7 to 8 seconds.
Same query. Same data. Same cloud. Different parameter.
👉 [The PostgreSQL Health Check That Prevents Sev-1s] Price: $29
60+ health indicators. One function call. No agents, no subscriptions, no data leaving your infrastructure.
Why This Happens More Than You Think
This isn’t a rare edge case. This is one of the most common patterns we see in PostgreSQL migrations, and it plays out the same way almost every time:
On-prem works fine. Parameters were set (or left at defaults) years ago. Nobody remembers why. Nobody needs to it works.
Migration happens. Data moves cleanly. Application connects. Everyone celebrates.
Performance tanks. Queries that were instant now crawl. Fingers point at the cloud provider.
Investigation reveals: server parameters were either carried over blindly from the old environment (where they might have been wrong but tolerable) or reset to defaults that don’t match the new infrastructure.
The effective_cache_size story is the one I’m telling today, but it’s not the only parameter that matters. In fact, because the architecture is so different, nearly every I/O-related parameter deserves a second look:
shared_buffers— How much memory PostgreSQL uses for its internal cache. On-prem, the OS cache could compensate for a small value. In the cloud, with network-attached storage, you want PostgreSQL caching as much as possible internally to avoid round trips.work_mem— Memory available for sort operations and hash tables per query. Undersized work_mem means spilling to disk and disk is now remote storage.random_page_cost— How expensive the planner thinks random disk I/O is. The default (4.0) assumes spinning disks. On cloud premium SSDs, set this to 1.1 to 1.5. This single change can flip a sequential scan to an index scan.effective_io_concurrency— How many concurrent I/O operations the system can handle. Cloud storage can handle more concurrent requests than a single local disk. Increase this (try 200 for SSDs).maintenance_work_mem— Memory for maintenance operations likeVACUUMandCREATE INDEX. Same principle: spilling to disk hurts more when disk is remote.seq_page_cost— On cloud storage with consistent throughput, the gap between sequential and random reads is narrower than on spinning disks. Consider lowering this.
Each of these tells the planner something about the world it’s operating in. When that world shifts from local hardware to a distributed cloud architecture and you don’t update these parameters you’re handing the planner a map of your old data center and asking it to navigate a completely different city.
Checklist You Didn’t Know You Needed
If you’re planning a PostgreSQL migration to Azure, to AWS, to any cloud provider, or even just to new hardware here’s what I’d pin above your desk:
Before Migration
Document your current server parameters. All of them. Not just the ones you remember setting.
Command: SHOW ALLBenchmark your critical queries. Get actual execution times and execution plans (
EXPLAIN ANALYZE). This is your baseline. Without it, you’re flying blind.Understand your current hardware and your target hardware. How much RAM on each? What kind of storage? Local or network-attached? What IOPS tier? What’s the I/O profile? The delta between these two environments determines which parameters need to change.
After Migration
Don’t just carry over old parameters try to tune them or give higher value or check provider recommendations accoridng to SKU. Review every single one in the context of your new architecture. A 3 GB
effective_cache_sizeon a 16 GB on-prem box was already a mistake. Carrying that mistake to a different architecture amplifies it exponentially.Set
effective_cache_sizeto approximately 75% of total RAM as a starting point. Adjust based on what else is running on the machine.Lower
random_page_costto reflect cloud SSD storage (try 1.1 instead of the default 4.0).Increase
effective_io_concurrencyfor cloud storage that can handle parallel I/O.Run
ANALYZEon all tables immediately after migration. The planner needs fresh statistics.Re-run your benchmark queries. Compare execution plans, not just times. If the plan changed, understand why. The plan change is the planner’s way of telling you it’s reacting to different information about its environment.
Warm the cache. Run critical queries several times. First execution in a cold cache is always slower that’s normal. Steady-state performance is what matters.
Mindset Shift
Respect the architecture change. Moving to the cloud isn’t copying your server somewhere else. It’s moving to a fundamentally different infrastructure model. Your database configuration needs to acknowledge that.
Parameters are not set and forget. They’re a conversation between you and the database about the environment it lives in. When the environment changes, the conversation needs to change too.
Cloud is not slower. An improperly configured database on a misunderstood architecture is slower. There’s a big difference.
Query plans tell you everything. If a query is slow, don’t guess. Run
EXPLAIN ANALYZEand read the plan. It will tell you exactly what the planner chose to do and why.
Bigger Lesson
Migration projects have a natural gravitational pull toward the big, visible work: moving data, updating connection strings, testing application compatibility, setting up CI/CD pipelines. These are important. They’re also the things that have runbooks and checklists and Jira tickets.
What often gets missed are the two invisible forces that actually determine whether your database performs well in its new home:
Architecture changed. You’re no longer on local hardware. I/O costs are different. Latency profiles are different. Concurrency models are different.
Parameters didn’t change. Your database still thinks it’s living in a world of local SSDs and 3 GB of cache.
Server parameter tuning isn’t in the migration runbook because it’s just configuration. It’s not in the testing plan because we tested the queries on-prem and they were fine.
And then you go live, and a 7second query takes 20 seconds, and suddenly it’s the most important thing in the room.
The parameters are not an afterthought. They are the migration.
Your data is just bytes. Your schema is just structure. The parameters are what tell PostgreSQL how to think about your data in its new home a home with a fundamentally different architecture than the one it came from. Skip that step, and you haven’t really migrated you’ve just shipped boxes to a new building without reading the floor plan.
Enjoyed? share with your audience


