Azure PostgreSQL Storage Reduction & Optimization — Technical Deep Dive
Table of Contents
- Introduction & Business Case
- The Storage Problem in Azure PostgreSQL
- Pre-Tasks: Assessing Your Environment
- Solution Architecture
- Step-by-Step Process Deep Dive
- Role & Permission Management
- Pre & Post Optimization SQL
- State Management & Resumability
- Performance Tuning During Import
- Failure Scenarios & Recovery
- Requirements & Prerequisites
- Usage Reference
- Troubleshooting Guide
1. Introduction & Business Case
Why This Exists
Non-production environments — development, QA, staging, integration testing — are typically created by cloning production databases. This is good practice: it ensures realistic data volumes, representative query patterns, and accurate performance benchmarking. The problem arises from how Azure PostgreSQL handles storage.
Azure PostgreSQL Flexible Server storage can only grow, never shrink. Once a server is provisioned at or scaled up to a given storage tier, the only way to reduce it is to delete the server and recreate it. There is no in-place downsize operation.
When production runs on an 8TB storage tier, every environment cloned from it inherits that 8TB allocation — even if the actual data footprint after environment-specific cleanup is 200–400GB. The excess storage is paid for every month, for every environment, indefinitely.
The Cost Problem
Azure PostgreSQL Flexible Server charges for provisioned storage regardless of actual utilisation. The billing model is per-GB-provisioned per-month, not per-GB-used.
Approximate storage costs (General Purpose tier, East US):
| Provisioned Storage | Monthly Cost | Annual Cost |
|---|---|---|
| 8,192 GB (8TB) | ~$943 | ~$11,316 |
| 1,024 GB (1TB) | ~$118 | ~$1,416 |
| 512 GB | ~$59 | ~$708 |
Cost savings per environment after optimization:
| Scenario | Before | After | Monthly Saving | Annual Saving |
|---|---|---|---|---|
| 8TB → 1TB | ~$943 | ~$118 | ~$825 | ~$9,900 |
| 8TB → 512GB | ~$943 | ~$59 | ~$884 | ~$10,608 |
| 2TB → 512GB | ~$230 | ~$59 | ~$171 | ~$2,052 |
Multiply these figures across a typical non-production estate of 4–8 environments and the annual savings reach $40,000–$85,000 — often exceeding the engineering cost of implementing and running the optimization.
Why Not Just Keep It Small from the Start?
In practice, environment cloning pipelines (which copy schema, data, and server configuration from production) automatically inherit the source server's storage tier. The clone operation does not offer a "target storage size" parameter — you clone the server as-is and then have no way to shrink it. This optimization process fills exactly that gap: it performs the downsize that Azure's native tooling does not support.
2. The Storage Problem in Azure PostgreSQL
How Azure PostgreSQL Storage Billing Works
Azure PostgreSQL Flexible Server allocates block storage (managed disks) in discrete tiers. The provisioned tier determines what you pay — not how much of that storage is actually written to. Key constraints:
- Auto-grow: If enabled, storage grows automatically when usage approaches the provisioned limit. It will never automatically shrink.
- Manual scale-up: You can increase storage via the portal, CLI, or API at any time.
- No scale-down: There is no API, portal option, or CLI flag to reduce a server's storage size. Microsoft's documentation explicitly states this is by design.
- Billing starts immediately: Storage charges apply from the moment the server is created, not from when data fills the space.
Why pg_dump / pg_restore Is the Only Option
Because there is no in-place resize, the only path to a smaller storage allocation is:
- Export all data off the server
- Delete the server (releasing the storage allocation)
- Recreate the server with the desired smaller storage tier
- Import all data back
This is exactly what this automation does. It wraps the operational complexity (ordering of steps, state preservation, role recreation, performance tuning) into a single resumable script.
Storage Utilisation vs. Allocation Gap
The gap between what is allocated and what is actually used is typically large for non-production environments:
Production: 8TB allocated, 7.2TB used (90% utilised — normal)
QA clone: 8TB allocated, ~300GB used (4% utilised — wasteful)
Dev clone: 8TB allocated, ~180GB used (2% utilised — very wasteful)
The non-production environments have had large tables truncated, transaction data cleaned up, audit logs cleared, and test data removed as part of environment preparation — yet they still pay for the full 8TB tier.
3. Pre-Tasks: Assessing Your Environment
Before running the optimization, collect information to confirm the operation is warranted and to choose the right target storage size. Run all queries against the source server while it is still running.
3.1 Check Provisioned vs. Actual Storage Usage
From Azure CLI — provisioned storage:
az postgres flexible-server show \
--subscription "<subscription_id>" \
--resource-group "<resource_group>" \
--name "<server_name>" \
--query "{name: name, storageSizeGb: storage.storageSizeGb, sku: sku.name}" \
-o table
From psql — actual database sizes:
-- Total size per database
SELECT
datname AS database,
pg_size_pretty(pg_database_size(datname)) AS size,
pg_database_size(datname) / 1024 / 1024 AS size_mb
FROM pg_database
WHERE datistemplate = false
ORDER BY pg_database_size(datname) DESC;
From psql — top 20 tables by size (run in each database):
SELECT
schemaname || '.' || tablename AS table,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename))
AS total_size,
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename))
AS data_size,
pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename))
AS index_size,
pg_total_relation_size(schemaname || '.' || tablename) / 1024 / 1024
AS total_mb
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema', 'cron')
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 20;
From psql — total size across all schemas:
SELECT
schemaname,
pg_size_pretty(SUM(pg_total_relation_size(schemaname || '.' || tablename)))
AS total_size,
SUM(pg_total_relation_size(schemaname || '.' || tablename)) / 1024 / 1024 / 1024
AS total_gb
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema', 'cron')
GROUP BY schemaname
ORDER BY SUM(pg_total_relation_size(schemaname || '.' || tablename)) DESC;
3.2 Identify Large Transient Tables
Audit/log/cache tables are typically the largest tables in non-production environments and are safe to truncate before export — they contain no persistent business data and will grow organically after the environment is restored.
-- Find tables with names suggesting transient data
SELECT
schemaname || '.' || tablename AS table,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename))
AS size,
pg_total_relation_size(schemaname || '.' || tablename) / 1024 / 1024
AS size_mb
FROM pg_tables
WHERE (tablename ILIKE '%log%'
OR tablename ILIKE '%audit%'
OR tablename ILIKE '%event%'
OR tablename ILIKE '%cache%'
OR tablename ILIKE '%temp%'
OR tablename ILIKE '%staging%'
OR tablename ILIKE '%archive%')
AND schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;
Add TRUNCATE TABLE <schema>.<table> CASCADE; for any identified tables to pre_optimization.sql before running the optimization. This directly reduces dump size and total operation time.
3.3 Check for Extensions That Require Special Handling
-- List all installed extensions
SELECT
extname AS extension,
extversion AS version,
nspname AS schema
FROM pg_extension
JOIN pg_namespace ON pg_namespace.oid = extnamespace
ORDER BY extname;
The optimization has built-in handling for:
- TimescaleDB: Dropped before export (
pre_optimization.sql), reinstalled after import (post_optimization.sql). TimescaleDB-modified hypertable structures can causepg_dump/pg_restorecomplications and its background workers interfere with ownership changes. - pg_cron: Excluded from the dump (
--exclude-schema=cron), extension recreated inpost_optimization.sql.
Any other extensions with custom background workers or catalog hooks should be evaluated individually.
3.4 Inventory Roles and Passwords
The optimization exports all application role definitions and hashed passwords from the source server and recreates them identically on the target. Verify what roles exist so you can confirm they are correctly recreated after import:
-- All application login roles and their attributes
SELECT
rolname AS role,
rolcanlogin AS can_login,
rolsuper AS superuser,
rolcreatedb AS createdb,
rolcreaterole AS createrole,
rolreplication AS replication
FROM pg_roles
WHERE rolname NOT LIKE 'pg_%'
AND rolname NOT LIKE 'azure%'
AND rolname NOT IN ('system', 'azuresu', 'replication')
ORDER BY rolname;
-- Role memberships
SELECT
r.rolname AS role,
m.rolname AS member_of
FROM pg_auth_members am
JOIN pg_roles r ON r.oid = am.roleid
JOIN pg_roles m ON m.oid = am.member
WHERE r.rolname NOT LIKE 'pg_%'
AND r.rolname NOT LIKE 'azure%'
ORDER BY r.rolname, m.rolname;
-- Per-role search_path settings
SELECT
r.rolname,
s.setconfig
FROM pg_db_role_setting s
JOIN pg_roles r ON r.oid = s.setrole
WHERE s.setdatabase = 0
AND r.rolname NOT LIKE 'pg_%'
AND r.rolname NOT LIKE 'azure%'
ORDER BY r.rolname;
Save this output. You will use it to verify role fidelity after the optimization completes.
3.5 Calculate Expected Dump Size and Duration
The dump size is approximately equal to the sum of all table data sizes (excluding indexes, which are rebuilt by pg_restore). A rough estimate:
-- Estimated dump size (table data only, no indexes)
SELECT
pg_size_pretty(SUM(pg_relation_size(schemaname || '.' || tablename)))
AS estimated_dump_size,
SUM(pg_relation_size(schemaname || '.' || tablename)) / 1024 / 1024 / 1024
AS estimated_dump_gb
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema', 'cron');
Duration estimates (parallel jobs = CPU count, capped at 16):
| Estimated Dump Size | Export Time | Import Time | Total |
|---|---|---|---|
| < 50 GB | 15–30 min | 20–40 min | < 1.5 hr |
| 50–200 GB | 30–90 min | 45–120 min | 2–4 hr |
| 200–500 GB | 1.5–3 hr | 2–4 hr | 4–8 hr |
| 500 GB+ | 3–6 hr | 4–8 hr | 7–15 hr |
Note: The 20-minute wait after server deletion (for Azure to release the server name from DNS) is not included in the above estimates but adds a fixed ~20 minutes to every run.
3.6 Confirm Available Local Disk Space
The dump is written to _dumps/<server_name>/ on the machine running the script. Ensure sufficient space is available:
# Check available disk space on the executing machine
df -h .
# Estimate required space (from the query above, take estimated_dump_gb × 1.1 for safety margin)
The directory format (-F d) used by pg_dump stores each table in a separate file. The total space required is approximately the uncompressed data size (no compression in directory format). Add 10–15% overhead.
4. Solution Architecture
Overview
The optimization implements a dump → delete → recreate → restore workflow. This is the only approach possible because Azure PostgreSQL provides no in-place storage downsize operation.
┌─────────────────────────────────────────────────────────────────┐
│ Executing Machine (local) │
│ │
│ optimize_db_storage.sh │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Step 0: Parse parameters │ │
│ │ Step 1: Retrieve PG server details, scale SKU up to D16 │ │
│ │ Step 2: pg_dumpall (roles) + pg_dump → _dumps/<name>/ │ │
│ │ Step 3: az postgres flexible-server delete ◄── NO RETURN │ │
│ │ Step 4: az postgres flexible-server create (target size) │ │
│ │ Step 5: Apply roles → pg_restore → post_optimization.sql │ │
│ │ Step 6: rm -rf _dumps/<name>/ │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │ │ │
└─────────────────────┼────────────────────┼───────────────────────┘
│ │
┌───────────▼───────────┐ ┌─────▼──────────────┐
│ Source PG Server │ │ New PG Server │
│ (8TB, D16 SKU temp) │ │ (512GB/1TB, D16) │
│ │ │ → scales to orig │
│ Deleted at step 3 │ │ SKU after import │
└───────────────────────┘ └────────────────────┘
Key Design Decisions
Why local execution instead of a Kubernetes job?
Running pg_dump and pg_restore on the same machine that invokes the script eliminates the dependency on an AKS cluster being available and correctly configured. It removes ~3 steps (nodepool creation, PVC provisioning, job management) and simplifies the failure surface substantially. The tradeoff is that the executing machine needs sufficient disk space and network connectivity to the PostgreSQL server.
Why directory format (-F d) for pg_dump?
The directory format is the only format that supports parallel dumps (-j). It stores each table as a separate file, allowing multiple parallel workers to write independently without contention. It also allows pg_restore to run in parallel with -j.
Why delete the source server before recreating it at a smaller size?
Azure does not allow two PostgreSQL Flexible Servers with the same name to coexist in the same subscription, even transiently. The server name is tied to a private DNS entry (<name>.postgres.database.azure.com) that must be fully released before it can be reused. After deletion, a 20-minute wait is required for Azure to release the DNS reservation.
Why temporarily scale the SKU up to 16 cores?
pg_dump and pg_restore are CPU and I/O intensive. Running them against a 4-core server creates a bottleneck at the server side — the server cannot process incoming data fast enough to keep parallel workers busy. Scaling to 16 cores for the duration of the operation reduces total time by 40–60%. The SKU is scaled back to the original after import completes.
Why are roles exported separately before pg_dump?
pg_dump with --no-acl skips all permission restoration (leading to broken application access), but without it, pg_restore fails because the roles referenced in ACL entries don't exist yet on the fresh server. The solution is to export roles separately using pg_dumpall --globals-only, apply them before pg_restore, and then run pg_restore without --no-acl — so ACLs are correctly restored from the dump.
5. Step-by-Step Process Deep Dive
Step 0 — Parse Parameters
Constructs the PostgreSQL resource ID from the flat parameters and sets the DUMP_DIR path. This step always runs (not state-managed) since it is pure variable assignment with no side effects.
pg_resource_id = /subscriptions/<sub>/resourceGroups/<rg>/
providers/Microsoft.DBforPostgreSQL/
flexibleServers/<name>
DUMP_DIR = <cwd>/_dumps/<postgresql_name>
PG_THREADS = min(nproc, 16)
Step 1 — Retrieve PostgreSQL Server Details & Scale Up SKU
Queries the Azure Resource Manager API for all server properties that need to be preserved and recreated:
| Property | Purpose |
|---|---|
location |
Target server must be in the same region |
sku.name / sku.tier |
Original compute SKU to scale back to after import |
version |
PostgreSQL major version for the new server |
storage.storageSizeGb |
Current size — determines target size |
network.delegatedSubnetResourceId |
Private networking — must match exactly |
network.privateDnsZoneArmResourceId |
Private DNS zone for the new server |
backup.backupRetentionDays |
Backup policy to preserve |
tags |
Resource tags to reapply |
azure.extensions |
Extension whitelist — must be set before first connection |
shared_preload_libraries |
Preloaded libraries (timescaledb, pg_cron, etc.) |
autovacuum, max_wal_size, checkpoint_timeout, maintenance_work_mem |
Current values to restore after import |
Target storage calculation:
pg_current_storage == 8192 GB → TARGET = 1024 GB (87.5% reduction)
pg_current_storage == 2048 GB → TARGET = 512 GB (75.0% reduction)
pg_current_storage == other → TARGET = 1024 GB (default)
SKU scale-up logic:
The script derives the 16-core equivalent of the original SKU using a sed regex that replaces the core count in the SKU name:
Standard_D4ds_v4 → Standard_D16ds_v4
Standard_D8ads_v5 → Standard_D16ads_v5
Standard_E4ads_v6 → Standard_E16ads_v6
If the regex does not produce a different string (i.e., the SKU name does not match the expected pattern), the scale-up is skipped and the server continues with its current SKU.
All retrieved values are persisted to the JSON state file so that if the script is resumed from a later step, the values are available without re-querying Azure.
Step 2 — Export Database to Local Machine
This step has three phases:
Phase 2a: Pre-optimization SQL
Runs _optimize_scripts/pre_optimization.sql against the source database before export. This script:
- Drops TimescaleDB if installed (see Section 7)
- Runs
ANALYZEfor export planner accuracy - Provides hooks for project-specific table truncation
Errors in this script are logged as warnings but do not stop the optimization — the intent is to reduce dump size, not gate the export.
Phase 2b: Role Export
Two-pass role extraction to ensure password hashes are captured regardless of Azure tier permissions:
Pass 1 — pg_dumpall --globals-only:
pg_dumpall -h <fqdn> -U system --globals-only | \
<filter out system/azure/pg_* roles> \
> _dumps/<name>/roles.sql
This captures role definitions (attributes, memberships, search_path settings) in the same SQL format that pg_dumpall produces, which is directly replayable via psql.
Pass 2 — pg_shadow direct query:
SELECT 'ALTER ROLE ' || quote_ident(usename) ||
' PASSWORD ' || quote_literal(passwd) || ';'
FROM pg_shadow
WHERE <exclude system roles> AND passwd IS NOT NULL;
pg_shadow is a view that exposes hashed passwords (SCRAM-SHA-256 or md5 depending on server configuration). The admin user (system) has pg_read_all_data membership on Azure, which grants read access to pg_shadow. This second pass appends ALTER ROLE ... PASSWORD '...' statements to roles.sql, overriding any password entries from Pass 1 with the authoritative values.
The combined roles.sql file contains:
CREATE ROLE— role existenceALTER ROLE ... WITH ... PASSWORD '...'— attributes and hashed credentialGRANT role TO role— membership grantsALTER ROLE ... SET search_path TO ...— per-role configuration
Phase 2c: pg_dump
pg_dump \
-h <fqdn> -U system -d postgres \
-F d # directory format (required for -j parallelism)
-j <nproc, max 16> # parallel workers
--verbose # progress output to log
--exclude-schema=cron # pg_cron schema is extension-owned; restored via CREATE EXTENSION
-f _dumps/<name>/dbdump
The directory format writes one file per table (<oid>.dat.gz) plus a table of contents (toc.dat) and a restore script (restore.sql). This layout is what enables parallel restore later — each worker picks up independent table files from the toc.dat.
Step 3 — Delete Source Database Server ⚠️
This is the point of no return. Once this step completes successfully, the source server is permanently gone. The dump at _dumps/<name>/ is the only copy of the data.
az postgres flexible-server delete --ids <resource_id> --yes
After deletion, the script waits 20 minutes (sleep 1200). This is not optional — Azure's internal DNS and resource registration system requires this time to fully deregister the server name. Attempting to recreate the server before this window closes will fail with a name conflict error.
The state file records this step as complete. If the script is interrupted during the 20-minute wait and resumed, the --start-from 4 flag skips this step (server is already deleted) and proceeds to creation.
Step 4 — Create New Database Server
Creates the new server with the target storage size, then applies all preserved configuration:
az postgres flexible-server create \
--name <same name> # DNS name is now available after 20min wait
--sku-name <16-core SKU> # high-core SKU for fast import
--tier <original tier>
--storage-size <TARGET_GB> # the downsized allocation
--version <original version>
--subnet <original subnet> # same VNet connectivity
--private-dns-zone <original zone> # same private DNS
--admin-user system
--admin-password <password>
After creation, the following are restored in order:
- Resource tags —
az resource tag - azure.extensions — the extension whitelist must be set before any extension is installed
- shared_preload_libraries — e.g.,
timescaledb,pg_cron; requires a server restart - Server restart — applies extension parameters
- Import performance parameters — set for the duration of the import:
| Parameter | Import Value | Rationale |
|---|---|---|
autovacuum |
off |
Prevents autovacuum from competing with bulk INSERT during restore |
max_wal_size |
32768 MB |
Reduces checkpoint frequency, fewer interruptions during bulk load |
checkpoint_timeout |
3600 s |
Checkpoints every hour instead of every 5 minutes |
maintenance_work_mem |
2048 MB |
More memory per index build in CREATE INDEX during restore |
- Server restart — applies performance parameters
Step 5 — Import Database from Local Machine
Phase 5a: Grant Admin Role
GRANT azure_pg_admin TO system;
Required because post_optimization.sql uses SET ROLE to temporarily become the schema owner when granting USAGE privileges. azure_pg_admin membership is what allows the admin user to perform this in Azure's managed PostgreSQL environment.
Phase 5b: Apply Roles
psql -h <fqdn> -U system -d postgres -f _dumps/<name>/roles.sql
This replays all CREATE ROLE, ALTER ROLE, GRANT (memberships), and ALTER ROLE SET statements from the source server. Any errors (e.g., role already exists from a partial previous run) are logged but do not stop the process — psql continues to the next statement.
At this point, the new server has the same role structure and hashed passwords as the source.
Phase 5c: pg_restore
pg_restore \
-h <fqdn> -U system -d postgres \
-j <nproc, max 16> # parallel workers
--no-owner # all objects restored as 'system'; post_opt handles ownership
--verbose # progress output to log
_dumps/<name>/dbdump
--no-acl is NOT passed. Because roles now exist (from Phase 5b), all GRANT statements embedded in the dump are executed as-is. This correctly restores cross-schema access grants (e.g., one role having SELECT access on another schema's tables) without any manual specification.
--no-owner IS passed. In Azure PostgreSQL, pg_restore cannot execute ALTER TABLE ... OWNER TO <role> for roles other than the current user, even if the current user is azure_pg_admin. All objects are therefore restored as owned by system, and ownership is corrected dynamically by post_optimization.sql.
|| true at the end means pg_restore exit code is always treated as success. pg_restore exits non-zero even when individual object errors are non-fatal (e.g., trying to create an index that already exists). The verbose log captures all individual errors for review.
Phase 5d: Post-optimization SQL
Runs _optimize_scripts/post_optimization.sql (see Section 7) with up to 3 automatic retries. Retry is necessary because TimescaleDB and pg_cron background workers, loaded via shared_preload_libraries, can crash asynchronously when they observe rapid catalog changes (ALTER TABLE OWNER TO on many objects in quick succession). The crash takes down the psql connection with:
FATAL: terminating connection because of crash of another server process
The post-optimization script is idempotent — re-running it produces the same result.
Phase 5e: Restore Original Parameters and Scale Down SKU
Performance parameters are restored to their pre-optimization values, then either:
- SKU was scaled up in Step 1:
az postgres flexible-server update --sku-name <original>— this triggers a brief restart automatically. - SKU was not scaled up:
az postgres flexible-server restart— applies the restored parameters.
Step 6 — Cleanup
Removes the local dump directory:
rm -rf _dumps/<server_name>/
This step is intentionally the last. The dump is retained until all import verification is complete (including post-optimization SQL). If anything fails in Step 5, the dump is available for a manual retry.
6. Role & Permission Management
This is the most operationally complex part of the process. PostgreSQL's security model requires careful ordering of operations on a fresh server.
The Core Problem
pg_dump exports object-level permissions (ACLs) as GRANT statements. When pg_restore replays these, the referenced roles must already exist. On a fresh server, no application roles exist — only the admin user.
Additionally, pg_restore with --no-owner assigns all objects to the restoring user (system). Application roles cannot own their schemas until the ownership is explicitly reassigned after the restore.
The Solution: roles.sql
The roles.sql file, generated during export, is applied to the new server before pg_restore. It contains:
-- 1. Role creation (from pg_dumpall --globals-only)
CREATE ROLE ice_imtx;
CREATE ROLE ice_infrastructure;
-- ...
-- 2. Role attributes and hashed passwords (merged from pg_shadow query)
ALTER ROLE ice_imtx WITH NOSUPERUSER INHERIT NOCREATEROLE NOCREATEDB
LOGIN NOREPLICATION PASSWORD 'SCRAM-SHA-256$4096:<salt>:<hash>';
-- ...
-- 3. Role memberships (from pg_dumpall --globals-only)
GRANT ice_infrastructure TO ice_nontransaction;
GRANT ice_metadata TO ice_nontransaction;
-- ...
-- 4. Per-role configuration (from pg_dumpall --globals-only)
ALTER ROLE ice_imtx SET search_path TO 'ice_imtx', 'public';
ALTER ROLE ice_infrastructure SET search_path TO 'ice_infrastructure', 'ice_nontransaction', 'public';
-- ...
By applying roles.sql before pg_restore, the restore can:
- Apply object-level ACLs (
GRANT SELECT ON TABLE ... TO role) — roles exist now - Leave ownership as
system(--no-owner) for post-optimization to fix
Dynamic Ownership Reassignment (post_optimization.sql)
After pg_restore, all schemas and their objects are owned by system. The post_optimization.sql script discovers all non-system schemas and, for each one, checks whether a role with the same name exists. If it does, that role becomes the schema's owner:
Schema: ice_imtx → Role: ice_imtx ✓ (role exists)
Schema: ice_journey → Role: ice_journey ✓ (role exists)
Schema: public → Skipped (excluded system schema)
Schema: _timescaledb → Skipped (extension-owned)
For each matching schema, the process is:
- Acquire USAGE from current schema owner (may require
SET ROLE <owner>if the schema was preserved with its original owner in the dump) - Grant membership of the target role to
session_user(the admin login) — required forALTER TABLE ... OWNER TO <role>in PostgreSQL - Grant USAGE, CREATE on the schema to the target role
- call
do_changeowner(role, schema)— iterates all tables, views, and functions in the schema and changes their owner ALTER SCHEMA ... OWNER TO <role>— transfers the schema namespace itselfGRANT ALL ON ALL TABLES/SEQUENCES/FUNCTIONS IN SCHEMA ... TO <role>— ensures full access to current objects
This is fully generic — it works for any schema/role naming convention where the role name matches the schema name.
7. Pre & Post Optimization SQL
pre_optimization.sql
Runs on the source server before pg_dump. Intended for cleanup operations that reduce dump size.
Built-in operations:
ANALYZE— updates planner statistics for better export performance- Conditional
DROP EXTENSION timescaledb CASCADE— TimescaleDB's internal tables (_timescaledb_catalog,_timescaledb_internal, etc.) add complexity to the dump and its background workers cause ownership-change crashes during restore. Dropping it before export produces a clean dump of the underlying data tables. It is reinstalled inpost_optimization.sql.
Customisation hook:
Add TRUNCATE TABLE <schema>.<table> CASCADE; statements for high-volume transient tables identified in the pre-assessment (Section 3.2). Each truncated table reduces the dump size and shortens the total operation time proportionally.
-- Example additions for a project-specific pre_optimization.sql:
TRUNCATE TABLE myschema.application_log CASCADE;
TRUNCATE TABLE myschema.event_message_param CASCADE;
TRUNCATE TABLE myschema.audit_trail CASCADE;
post_optimization.sql
Runs on the new server after pg_restore. It is structured in sections that must execute in a specific order:
GRANT azure_pg_admin TO system ← must be first
do_changeowner() function creation ← must precede ownership block
Dynamic ownership reassignment ← must precede TimescaleDB install
pg_sleep(5) ← stabilise background workers
pg_cron extension ← optional, conditional
TimescaleDB extension ← must be last DDL operation
ANALYZE VERBOSE ← final step
Why TimescaleDB must be last:
TimescaleDB registers background workers (via shared_preload_libraries) that connect to the database and watch for hypertable catalog changes. If TimescaleDB is installed before do_changeowner() runs, those background workers see ALTER TABLE ... OWNER TO operations on hypertables and crash asynchronously. The crash terminates the psql connection mid-script with a fatal error. Installing TimescaleDB after all ownership changes means the background workers start with a fully consistent catalog.
Why pg_cron is installed before TimescaleDB:
pg_cron's background worker connects at startup and looks for the cron schema. If the schema is missing, older versions can crash — which would terminate the connection before TimescaleDB is installed. Installing pg_cron (which creates the cron schema) before pg_sleep ensures the worker stabilises before the next wave of DDL.
8. State Management & Resumability
State File Format
Every run maintains a JSON state file at _logs/state_<server_name>.json. It tracks completed steps and persists all retrieved server properties:
{
"step_0": {"status": "complete", "completed_at": "2024-11-15 09:00:01"},
"step_1": {"status": "complete", "completed_at": "2024-11-15 09:02:47"},
"step_2": {"status": "complete", "completed_at": "2024-11-15 11:23:05"},
"vars": {
"pg_location": "eastus",
"pg_sku_name": "Standard_D4ads_v5",
"pg_sku_tier": "GeneralPurpose",
"pg_version": "16",
"pg_current_storage": "8192",
"TARGET_STORAGE_GB": "1024",
"SKU_SCALED_UP": "true",
"pg_sku_16core": "Standard_D16ads_v5",
"pg_subnet_id": "/subscriptions/.../subnets/...",
...
}
}
The vars section allows full resumption from any step without re-querying Azure — critical for steps 4 and 5 where the source server no longer exists.
Resume Behaviour
When --start-from N is specified:
- Steps 0 through N-1 are marked
completein the state file - Step N has its state entry deleted (so it runs fresh)
- Steps N+1 onward run normally (skipped if already complete)
Variables saved by step 1 are always loaded from the state file when step 1 is skipped, so all server properties are available regardless of entry point.
When to Use --start-from
| Failure Scenario | Resume Step | Notes |
|---|---|---|
| pre_optimization.sql error | --start-from 2 |
Source server still exists; fix the SQL first |
| pg_dump failed midway | --start-from 2 |
Re-runs the entire export; previous partial dump is overwritten |
| pg_dump succeeded, delete failed | --start-from 3 |
Source server still exists |
| During 20-minute wait after delete | --start-from 4 |
Server already deleted; skip to creation |
| Server creation failed | --start-from 4 |
Retry server creation |
| pg_restore failed | --start-from 5 |
Dump still at _dumps/<name>/; check for space/connectivity |
| post_optimization.sql failed | --start-from 5 |
pg_restore re-runs (idempotent for most operations) |
9. Performance Tuning During Import
Import-Phase PostgreSQL Parameters
During Steps 4 and 5, the server is configured for maximum bulk-load throughput rather than normal OLTP operation:
autovacuum = off
Autovacuum daemon is disabled during import. During a bulk load, the tables are written to sequentially and autovacuum would repeatedly interrupt to process dead tuples from multi-pass inserts. Disabling it eliminates this contention. Re-enabled (to its original value) after import.
max_wal_size = 32768 MB
Controls how large the WAL (Write-Ahead Log) can grow before a checkpoint is forced. The default (typically 1–2GB) causes frequent checkpoints during bulk import, pausing all writes briefly on each one. Setting it to 32GB means checkpoints happen much less frequently, allowing the import to proceed without interruption.
checkpoint_timeout = 3600 s
Ensures checkpoints happen at most every hour, further reducing checkpoint frequency during the import phase.
maintenance_work_mem = 2048 MB
Each CREATE INDEX operation during pg_restore gets up to 2GB of working memory. Index creation is typically the longest phase of a pg_restore. More memory per build means faster sort operations and fewer disk spills. pg_restore -j N runs N index builds in parallel, so the total memory consumption is N × maintenance_work_mem — size this based on available RAM.
Parallel Workers
pg_dump and pg_restore both use -j <PG_THREADS> for parallelism. The thread count is auto-detected:
PG_THREADS = min(nproc, 16)
nproc returns the number of logical CPUs on the executing machine. Capped at 16 because PostgreSQL's parallel utility workers show diminishing returns beyond ~16 connections for typical database sizes, and higher concurrency increases lock contention on the target server.
10. Failure Scenarios & Recovery
Scenario A: Failure Before Step 3 (Source Still Exists)
The source server is untouched. You can fix any issue and resume, or simply restart from scratch by deleting the state file:
rm _logs/state_<server_name>.json
./optimize_db_storage.sh <params>
Scenario B: Failure During Step 3 (Delete)
If the delete command itself fails, the source server may or may not still exist. Check:
az postgres flexible-server show \
--subscription <sub> --resource-group <rg> --name <server> \
--query "{name:name, state:state}" -o json
If the server still exists, resume from --start-from 3. If it no longer exists, resume from --start-from 4.
Scenario C: Failure After Step 3 (Source Deleted, Target Not Created)
This is the highest-risk scenario. The source server is gone but the dump is present. Do not panic — the dump is all that is needed. Resume:
./optimize_db_storage.sh <params> --start-from 4
The state file contains all server properties needed to recreate the server correctly without re-querying Azure.
Scenario D: Failure During Step 5 (Import)
The new server exists but the import is incomplete or failed. The dump at _dumps/<name>/ is still intact. Check whether roles were applied:
PGPASSWORD="<password>" psql -h <fqdn> -U system -d postgres \
-c "SELECT rolname FROM pg_roles WHERE rolname NOT LIKE 'pg_%' AND rolname NOT LIKE 'azure%' ORDER BY rolname;"
Resume from --start-from 5. pg_restore with --no-owner is largely idempotent for CREATE TABLE / INSERT operations (duplicate key violations on already-restored tables appear in the log but do not abort the restore).
Scenario E: post_optimization.sql Failed After pg_restore
The data is fully restored but ownership and grants may be incomplete. Resume from --start-from 5 — pg_restore will produce some errors for already-existing objects but post_optimization.sql will run cleanly on the second attempt.
Inspecting the State File
# View all completed steps
jq '.[] | select(.status=="complete")' _logs/state_<server_name>.json
# View a specific saved variable
jq '.vars.pg_sku_name' _logs/state_<server_name>.json
# View all saved variables
jq '.vars' _logs/state_<server_name>.json
11. Requirements & Prerequisites
Azure Permissions
The principal running this script (user or service principal) requires:
| Permission | Scope | Purpose |
|---|---|---|
Microsoft.DBforPostgreSQL/flexibleServers/read |
Server | Steps 1, 4 |
Microsoft.DBforPostgreSQL/flexibleServers/write |
Resource Group | Steps 1 (SKU update), 4 (create) |
Microsoft.DBforPostgreSQL/flexibleServers/delete |
Server | Step 3 |
Microsoft.DBforPostgreSQL/flexibleServers/configurations/write |
Server | Steps 4, 5 |
Microsoft.DBforPostgreSQL/flexibleServers/restart/action |
Server | Steps 4, 5 |
Microsoft.Resources/tags/write |
Server | Step 4 (tag restoration) |
A role with Contributor on the resource group satisfies all of these.
Local Machine Requirements
| Requirement | Notes |
|---|---|
Azure CLI (az) |
Must be authenticated (az login or service principal); subscription context set |
psql |
PostgreSQL client; version should be ≥ the server version |
pg_dump |
Same version constraints as psql |
pg_restore |
Same version constraints as psql |
pg_dumpall |
Used for role export; part of standard PostgreSQL client package |
jq |
State file management; 1.5+ |
| Network connectivity | Direct connectivity to the PostgreSQL private endpoint |
| Disk space | ≥ dump size + 15% overhead in the script's working directory |
Important: If the PostgreSQL server is on a private endpoint (typical in production-like environments), the executing machine must be on the same VNet or connected via VPN/ExpressRoute. The script connects to
<server_name>.postgres.database.azure.com, which resolves to the private IP in a private DNS zone.
PostgreSQL Server Requirements
- PostgreSQL version: 11–17 supported
- Admin user:
system(configurable viaPG_ADMIN_USERconstant in the script) - The admin user must have
azure_pg_adminmembership (default for the first admin user on Flexible Server)
12. Usage Reference
Main Script
./optimize_db_storage.sh \
<subscription_id> \
<resource_group> \
<postgresql_name> \
<pg_system_password> \
[--start-from <step>]
Wrapper Script (Recommended)
The wrapper validates that the PostgreSQL server exists before invoking the main script:
./optimize_db_storage_wrapper.sh \
<subscription_id> \
<resource_group> \
<postgresql_name> \
<pg_system_password> \
[--start-from <step>]
Examples
# Basic run
./optimize_db_storage_wrapper.sh \
"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
"rg-qa" \
"qadb" \
"MyP@ssw0rd!"
# Resume from import step after a failure
./optimize_db_storage_wrapper.sh \
"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
"rg-qa" \
"qadb" \
"MyP@ssw0rd!" \
--start-from 5
# Customise parallelism (edit PG_THREADS constant at top of main script
# or export before running if you want fewer threads on a shared machine)
Step Reference
| Step | Name | Safe to Retry | Source Server |
|---|---|---|---|
| 0 | Parse parameters | ✓ | Exists |
| 1 | Retrieve PG details & scale up SKU | ✓ | Exists |
| 2 | Export database (local pg_dump) | ✓ | Exists |
| 3 | Delete source server | ✓ (idempotent) | Deleted |
| 4 | Create new database server | ✓ | Deleted |
| 5 | Import database (pg_restore + post SQL) | ✓ | Deleted |
| 6 | Cleanup dump directory | ✓ | Deleted |
Output Files
| Path | Description |
|---|---|
_logs/storage_opt_<timestamp>.log |
Full execution log (all commands and output) |
_logs/state_<server_name>.json |
Resumable state (removed on successful completion) |
_dumps/<server_name>/roles.sql |
Exported role definitions with password hashes |
_dumps/<server_name>/dbdump/ |
pg_dump directory-format output (removed at step 6) |
13. Troubleshooting Guide
pg_dump fails with "connection refused" or "no route to host"
The executing machine cannot reach the PostgreSQL private endpoint.
# Test connectivity
psql -h <server>.postgres.database.azure.com -U system -d postgres -c "SELECT 1;"
# Check if the private DNS resolves
nslookup <server>.postgres.database.azure.com
Ensure the machine is on the same VNet or peered VNet, or connected via VPN/ExpressRoute. Check Network Security Group rules for port 5432.
Role statements fail with "role already exists"
Expected and harmless. psql logs the error and continues. This happens on retries when some roles were already created in a previous attempt.
pg_restore reports errors on individual objects
A non-zero pg_restore exit code with individual object errors is normal. Common sources:
ERROR: relation "..." already exists— table was already restored in a previous partial runERROR: role "..." does not exist— a grant references a role that was filtered fromroles.sql; review the filter patterns if the role is an application role
Review _logs/storage_opt_*.log for the full list of errors.
post_optimization.sql fails with "crash of another server process"
TimescaleDB or pg_cron background workers crashed during the ownership-change DDL. The script automatically retries up to 3 times with a 30-second wait between attempts. If all 3 attempts fail, run manually after the server stabilises:
PGPASSWORD="<password>" psql \
-h <server>.postgres.database.azure.com \
-U system -d postgres \
-f _optimize_scripts/post_optimization.sql
Server creation fails with "name already in use"
The 20-minute DNS release window after deletion has not elapsed, or a previous run deleted the server and the wait was interrupted. Wait and retry:
./optimize_db_storage.sh <params> --start-from 4
Storage target is not reducing costs enough
The dynamic target storage calculation defaults to 1TB for all sizes other than 8TB→1TB or 2TB→512GB. To set a custom target, modify the calculation block in Step 1 of optimize_db_storage.sh:
# Example: target 512GB for any server with < 1TB actual data
if [ "${pg_current_storage}" -eq 8192 ]; then
TARGET_STORAGE_GB=512 # was 1024
elif ...
Note: The target size must be ≥ your actual data size plus a safety buffer (at minimum 20% headroom for indexes, WAL, autovacuum, and growth).
Checking post-optimization role fidelity
After Step 5 completes, verify that all roles were recreated correctly:
# Compare roles on new server against pre-assessment output (Section 3.4)
PGPASSWORD="<password>" psql \
-h <server>.postgres.database.azure.com \
-U system -d postgres \
-c "SELECT rolname, rolcanlogin, rolsuper FROM pg_roles
WHERE rolname NOT LIKE 'pg_%' AND rolname NOT LIKE 'azure%'
ORDER BY rolname;"
# Verify schema ownership
PGPASSWORD="<password>" psql \
-h <server>.postgres.database.azure.com \
-U system -d postgres \
-c "SELECT n.nspname AS schema, pg_get_userbyid(n.nspowner) AS owner
FROM pg_namespace n
WHERE n.nspname NOT LIKE 'pg_%' AND n.nspname NOT IN ('information_schema', 'public')
ORDER BY n.nspname;"
- Azure PostgreSQL Storage Reduction & Optimization - Technical Deep Dive · April 2026 *
AI Disclosure: I leverage AI to help draft, structure, and refine these posts. However, all technical insights, architectural decisions, and final reviews are handled personally to ensure accuracy and real-world reliability.