SNSyam Nair
← Writing
AzurePostgreSQLStorageOptimizationFlexible Server

Azure PostgreSQL Storage Reduction & Optimization — Technical Deep Dive

43 min readSyam Nair

Table of Contents

  1. Introduction & Business Case
  2. The Storage Problem in Azure PostgreSQL
  3. Pre-Tasks: Assessing Your Environment
  4. Solution Architecture
  5. Step-by-Step Process Deep Dive
  6. Role & Permission Management
  7. Pre & Post Optimization SQL
  8. State Management & Resumability
  9. Performance Tuning During Import
  10. Failure Scenarios & Recovery
  11. Requirements & Prerequisites
  12. Usage Reference
  13. Troubleshooting Guide

1. Introduction & Business Case

Why This Exists

Non-production environments — development, QA, staging, integration testing — are typically created by cloning production databases. This is good practice: it ensures realistic data volumes, representative query patterns, and accurate performance benchmarking. The problem arises from how Azure PostgreSQL handles storage.

Azure PostgreSQL Flexible Server storage can only grow, never shrink. Once a server is provisioned at or scaled up to a given storage tier, the only way to reduce it is to delete the server and recreate it. There is no in-place downsize operation.

When production runs on an 8TB storage tier, every environment cloned from it inherits that 8TB allocation — even if the actual data footprint after environment-specific cleanup is 200–400GB. The excess storage is paid for every month, for every environment, indefinitely.

The Cost Problem

Azure PostgreSQL Flexible Server charges for provisioned storage regardless of actual utilisation. The billing model is per-GB-provisioned per-month, not per-GB-used.

Approximate storage costs (General Purpose tier, East US):

Provisioned Storage Monthly Cost Annual Cost
8,192 GB (8TB) ~$943 ~$11,316
1,024 GB (1TB) ~$118 ~$1,416
512 GB ~$59 ~$708

Cost savings per environment after optimization:

Scenario Before After Monthly Saving Annual Saving
8TB → 1TB ~$943 ~$118 ~$825 ~$9,900
8TB → 512GB ~$943 ~$59 ~$884 ~$10,608
2TB → 512GB ~$230 ~$59 ~$171 ~$2,052

Multiply these figures across a typical non-production estate of 4–8 environments and the annual savings reach $40,000–$85,000 — often exceeding the engineering cost of implementing and running the optimization.

Why Not Just Keep It Small from the Start?

In practice, environment cloning pipelines (which copy schema, data, and server configuration from production) automatically inherit the source server's storage tier. The clone operation does not offer a "target storage size" parameter — you clone the server as-is and then have no way to shrink it. This optimization process fills exactly that gap: it performs the downsize that Azure's native tooling does not support.


2. The Storage Problem in Azure PostgreSQL

How Azure PostgreSQL Storage Billing Works

Azure PostgreSQL Flexible Server allocates block storage (managed disks) in discrete tiers. The provisioned tier determines what you pay — not how much of that storage is actually written to. Key constraints:

  • Auto-grow: If enabled, storage grows automatically when usage approaches the provisioned limit. It will never automatically shrink.
  • Manual scale-up: You can increase storage via the portal, CLI, or API at any time.
  • No scale-down: There is no API, portal option, or CLI flag to reduce a server's storage size. Microsoft's documentation explicitly states this is by design.
  • Billing starts immediately: Storage charges apply from the moment the server is created, not from when data fills the space.

Why pg_dump / pg_restore Is the Only Option

Because there is no in-place resize, the only path to a smaller storage allocation is:

  1. Export all data off the server
  2. Delete the server (releasing the storage allocation)
  3. Recreate the server with the desired smaller storage tier
  4. Import all data back

This is exactly what this automation does. It wraps the operational complexity (ordering of steps, state preservation, role recreation, performance tuning) into a single resumable script.

Storage Utilisation vs. Allocation Gap

The gap between what is allocated and what is actually used is typically large for non-production environments:

Production:       8TB allocated,  7.2TB used  (90% utilised — normal)
QA clone:         8TB allocated,  ~300GB used (4% utilised — wasteful)
Dev clone:        8TB allocated,  ~180GB used (2% utilised — very wasteful)

The non-production environments have had large tables truncated, transaction data cleaned up, audit logs cleared, and test data removed as part of environment preparation — yet they still pay for the full 8TB tier.


3. Pre-Tasks: Assessing Your Environment

Before running the optimization, collect information to confirm the operation is warranted and to choose the right target storage size. Run all queries against the source server while it is still running.

3.1 Check Provisioned vs. Actual Storage Usage

From Azure CLI — provisioned storage:

az postgres flexible-server show \
  --subscription "<subscription_id>" \
  --resource-group "<resource_group>" \
  --name "<server_name>" \
  --query "{name: name, storageSizeGb: storage.storageSizeGb, sku: sku.name}" \
  -o table

From psql — actual database sizes:

-- Total size per database
SELECT
    datname                                    AS database,
    pg_size_pretty(pg_database_size(datname))  AS size,
    pg_database_size(datname) / 1024 / 1024    AS size_mb
FROM pg_database
WHERE datistemplate = false
ORDER BY pg_database_size(datname) DESC;

From psql — top 20 tables by size (run in each database):

SELECT
    schemaname || '.' || tablename             AS table,
    pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename))
                                               AS total_size,
    pg_size_pretty(pg_relation_size(schemaname || '.' || tablename))
                                               AS data_size,
    pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename))
                                               AS index_size,
    pg_total_relation_size(schemaname || '.' || tablename) / 1024 / 1024
                                               AS total_mb
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema', 'cron')
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 20;

From psql — total size across all schemas:

SELECT
    schemaname,
    pg_size_pretty(SUM(pg_total_relation_size(schemaname || '.' || tablename)))
                                               AS total_size,
    SUM(pg_total_relation_size(schemaname || '.' || tablename)) / 1024 / 1024 / 1024
                                               AS total_gb
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema', 'cron')
GROUP BY schemaname
ORDER BY SUM(pg_total_relation_size(schemaname || '.' || tablename)) DESC;

3.2 Identify Large Transient Tables

Audit/log/cache tables are typically the largest tables in non-production environments and are safe to truncate before export — they contain no persistent business data and will grow organically after the environment is restored.

-- Find tables with names suggesting transient data
SELECT
    schemaname || '.' || tablename             AS table,
    pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename))
                                               AS size,
    pg_total_relation_size(schemaname || '.' || tablename) / 1024 / 1024
                                               AS size_mb
FROM pg_tables
WHERE (tablename ILIKE '%log%'
    OR tablename ILIKE '%audit%'
    OR tablename ILIKE '%event%'
    OR tablename ILIKE '%cache%'
    OR tablename ILIKE '%temp%'
    OR tablename ILIKE '%staging%'
    OR tablename ILIKE '%archive%')
  AND schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;

Add TRUNCATE TABLE <schema>.<table> CASCADE; for any identified tables to pre_optimization.sql before running the optimization. This directly reduces dump size and total operation time.

3.3 Check for Extensions That Require Special Handling

-- List all installed extensions
SELECT
    extname         AS extension,
    extversion      AS version,
    nspname         AS schema
FROM pg_extension
JOIN pg_namespace ON pg_namespace.oid = extnamespace
ORDER BY extname;

The optimization has built-in handling for:

  • TimescaleDB: Dropped before export (pre_optimization.sql), reinstalled after import (post_optimization.sql). TimescaleDB-modified hypertable structures can cause pg_dump/pg_restore complications and its background workers interfere with ownership changes.
  • pg_cron: Excluded from the dump (--exclude-schema=cron), extension recreated in post_optimization.sql.

Any other extensions with custom background workers or catalog hooks should be evaluated individually.

3.4 Inventory Roles and Passwords

The optimization exports all application role definitions and hashed passwords from the source server and recreates them identically on the target. Verify what roles exist so you can confirm they are correctly recreated after import:

-- All application login roles and their attributes
SELECT
    rolname         AS role,
    rolcanlogin     AS can_login,
    rolsuper        AS superuser,
    rolcreatedb     AS createdb,
    rolcreaterole   AS createrole,
    rolreplication  AS replication
FROM pg_roles
WHERE rolname NOT LIKE 'pg_%'
  AND rolname NOT LIKE 'azure%'
  AND rolname NOT IN ('system', 'azuresu', 'replication')
ORDER BY rolname;

-- Role memberships
SELECT
    r.rolname   AS role,
    m.rolname   AS member_of
FROM pg_auth_members am
JOIN pg_roles r ON r.oid = am.roleid
JOIN pg_roles m ON m.oid = am.member
WHERE r.rolname NOT LIKE 'pg_%'
  AND r.rolname NOT LIKE 'azure%'
ORDER BY r.rolname, m.rolname;

-- Per-role search_path settings
SELECT
    r.rolname,
    s.setconfig
FROM pg_db_role_setting s
JOIN pg_roles r ON r.oid = s.setrole
WHERE s.setdatabase = 0
  AND r.rolname NOT LIKE 'pg_%'
  AND r.rolname NOT LIKE 'azure%'
ORDER BY r.rolname;

Save this output. You will use it to verify role fidelity after the optimization completes.

3.5 Calculate Expected Dump Size and Duration

The dump size is approximately equal to the sum of all table data sizes (excluding indexes, which are rebuilt by pg_restore). A rough estimate:

-- Estimated dump size (table data only, no indexes)
SELECT
    pg_size_pretty(SUM(pg_relation_size(schemaname || '.' || tablename)))
                                               AS estimated_dump_size,
    SUM(pg_relation_size(schemaname || '.' || tablename)) / 1024 / 1024 / 1024
                                               AS estimated_dump_gb
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema', 'cron');

Duration estimates (parallel jobs = CPU count, capped at 16):

Estimated Dump Size Export Time Import Time Total
< 50 GB 15–30 min 20–40 min < 1.5 hr
50–200 GB 30–90 min 45–120 min 2–4 hr
200–500 GB 1.5–3 hr 2–4 hr 4–8 hr
500 GB+ 3–6 hr 4–8 hr 7–15 hr

Note: The 20-minute wait after server deletion (for Azure to release the server name from DNS) is not included in the above estimates but adds a fixed ~20 minutes to every run.

3.6 Confirm Available Local Disk Space

The dump is written to _dumps/<server_name>/ on the machine running the script. Ensure sufficient space is available:

# Check available disk space on the executing machine
df -h .

# Estimate required space (from the query above, take estimated_dump_gb × 1.1 for safety margin)

The directory format (-F d) used by pg_dump stores each table in a separate file. The total space required is approximately the uncompressed data size (no compression in directory format). Add 10–15% overhead.


4. Solution Architecture

Overview

The optimization implements a dump → delete → recreate → restore workflow. This is the only approach possible because Azure PostgreSQL provides no in-place storage downsize operation.

┌─────────────────────────────────────────────────────────────────┐
│                     Executing Machine (local)                    │
│                                                                  │
│  optimize_db_storage.sh                                         │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │  Step 0: Parse parameters                                  │ │
│  │  Step 1: Retrieve PG server details, scale SKU up to D16   │ │
│  │  Step 2: pg_dumpall (roles) + pg_dump → _dumps/<name>/     │ │
│  │  Step 3: az postgres flexible-server delete  ◄── NO RETURN │ │
│  │  Step 4: az postgres flexible-server create (target size)  │ │
│  │  Step 5: Apply roles → pg_restore → post_optimization.sql  │ │
│  │  Step 6: rm -rf _dumps/<name>/                             │ │
│  └────────────────────────────────────────────────────────────┘ │
│                     │                    │                       │
└─────────────────────┼────────────────────┼───────────────────────┘
                      │                    │
          ┌───────────▼───────────┐  ┌─────▼──────────────┐
          │  Source PG Server     │  │  New PG Server      │
          │  (8TB, D16 SKU temp)  │  │  (512GB/1TB, D16)  │
          │                       │  │  → scales to orig  │
          │  Deleted at step 3    │  │    SKU after import │
          └───────────────────────┘  └────────────────────┘

Key Design Decisions

Why local execution instead of a Kubernetes job? Running pg_dump and pg_restore on the same machine that invokes the script eliminates the dependency on an AKS cluster being available and correctly configured. It removes ~3 steps (nodepool creation, PVC provisioning, job management) and simplifies the failure surface substantially. The tradeoff is that the executing machine needs sufficient disk space and network connectivity to the PostgreSQL server.

Why directory format (-F d) for pg_dump? The directory format is the only format that supports parallel dumps (-j). It stores each table as a separate file, allowing multiple parallel workers to write independently without contention. It also allows pg_restore to run in parallel with -j.

Why delete the source server before recreating it at a smaller size? Azure does not allow two PostgreSQL Flexible Servers with the same name to coexist in the same subscription, even transiently. The server name is tied to a private DNS entry (<name>.postgres.database.azure.com) that must be fully released before it can be reused. After deletion, a 20-minute wait is required for Azure to release the DNS reservation.

Why temporarily scale the SKU up to 16 cores? pg_dump and pg_restore are CPU and I/O intensive. Running them against a 4-core server creates a bottleneck at the server side — the server cannot process incoming data fast enough to keep parallel workers busy. Scaling to 16 cores for the duration of the operation reduces total time by 40–60%. The SKU is scaled back to the original after import completes.

Why are roles exported separately before pg_dump? pg_dump with --no-acl skips all permission restoration (leading to broken application access), but without it, pg_restore fails because the roles referenced in ACL entries don't exist yet on the fresh server. The solution is to export roles separately using pg_dumpall --globals-only, apply them before pg_restore, and then run pg_restore without --no-acl — so ACLs are correctly restored from the dump.


5. Step-by-Step Process Deep Dive

Step 0 — Parse Parameters

Constructs the PostgreSQL resource ID from the flat parameters and sets the DUMP_DIR path. This step always runs (not state-managed) since it is pure variable assignment with no side effects.

pg_resource_id = /subscriptions/<sub>/resourceGroups/<rg>/
                 providers/Microsoft.DBforPostgreSQL/
                 flexibleServers/<name>
DUMP_DIR       = <cwd>/_dumps/<postgresql_name>
PG_THREADS     = min(nproc, 16)

Step 1 — Retrieve PostgreSQL Server Details & Scale Up SKU

Queries the Azure Resource Manager API for all server properties that need to be preserved and recreated:

Property Purpose
location Target server must be in the same region
sku.name / sku.tier Original compute SKU to scale back to after import
version PostgreSQL major version for the new server
storage.storageSizeGb Current size — determines target size
network.delegatedSubnetResourceId Private networking — must match exactly
network.privateDnsZoneArmResourceId Private DNS zone for the new server
backup.backupRetentionDays Backup policy to preserve
tags Resource tags to reapply
azure.extensions Extension whitelist — must be set before first connection
shared_preload_libraries Preloaded libraries (timescaledb, pg_cron, etc.)
autovacuum, max_wal_size, checkpoint_timeout, maintenance_work_mem Current values to restore after import

Target storage calculation:

pg_current_storage == 8192 GB  →  TARGET = 1024 GB  (87.5% reduction)
pg_current_storage == 2048 GB  →  TARGET =  512 GB  (75.0% reduction)
pg_current_storage == other    →  TARGET = 1024 GB  (default)

SKU scale-up logic:

The script derives the 16-core equivalent of the original SKU using a sed regex that replaces the core count in the SKU name:

Standard_D4ds_v4   →  Standard_D16ds_v4
Standard_D8ads_v5  →  Standard_D16ads_v5
Standard_E4ads_v6  →  Standard_E16ads_v6

If the regex does not produce a different string (i.e., the SKU name does not match the expected pattern), the scale-up is skipped and the server continues with its current SKU.

All retrieved values are persisted to the JSON state file so that if the script is resumed from a later step, the values are available without re-querying Azure.

Step 2 — Export Database to Local Machine

This step has three phases:

Phase 2a: Pre-optimization SQL

Runs _optimize_scripts/pre_optimization.sql against the source database before export. This script:

  • Drops TimescaleDB if installed (see Section 7)
  • Runs ANALYZE for export planner accuracy
  • Provides hooks for project-specific table truncation

Errors in this script are logged as warnings but do not stop the optimization — the intent is to reduce dump size, not gate the export.

Phase 2b: Role Export

Two-pass role extraction to ensure password hashes are captured regardless of Azure tier permissions:

Pass 1 — pg_dumpall --globals-only:

pg_dumpall -h <fqdn> -U system --globals-only | \
  <filter out system/azure/pg_* roles> \
  > _dumps/<name>/roles.sql

This captures role definitions (attributes, memberships, search_path settings) in the same SQL format that pg_dumpall produces, which is directly replayable via psql.

Pass 2 — pg_shadow direct query:

SELECT 'ALTER ROLE ' || quote_ident(usename) ||
       ' PASSWORD ' || quote_literal(passwd) || ';'
FROM pg_shadow
WHERE <exclude system roles> AND passwd IS NOT NULL;

pg_shadow is a view that exposes hashed passwords (SCRAM-SHA-256 or md5 depending on server configuration). The admin user (system) has pg_read_all_data membership on Azure, which grants read access to pg_shadow. This second pass appends ALTER ROLE ... PASSWORD '...' statements to roles.sql, overriding any password entries from Pass 1 with the authoritative values.

The combined roles.sql file contains:

  • CREATE ROLE — role existence
  • ALTER ROLE ... WITH ... PASSWORD '...' — attributes and hashed credential
  • GRANT role TO role — membership grants
  • ALTER ROLE ... SET search_path TO ... — per-role configuration

Phase 2c: pg_dump

pg_dump \
  -h <fqdn> -U system -d postgres \
  -F d                    # directory format (required for -j parallelism)
  -j <nproc, max 16>      # parallel workers
  --verbose               # progress output to log
  --exclude-schema=cron   # pg_cron schema is extension-owned; restored via CREATE EXTENSION
  -f _dumps/<name>/dbdump

The directory format writes one file per table (<oid>.dat.gz) plus a table of contents (toc.dat) and a restore script (restore.sql). This layout is what enables parallel restore later — each worker picks up independent table files from the toc.dat.

Step 3 — Delete Source Database Server ⚠️

This is the point of no return. Once this step completes successfully, the source server is permanently gone. The dump at _dumps/<name>/ is the only copy of the data.

az postgres flexible-server delete --ids <resource_id> --yes

After deletion, the script waits 20 minutes (sleep 1200). This is not optional — Azure's internal DNS and resource registration system requires this time to fully deregister the server name. Attempting to recreate the server before this window closes will fail with a name conflict error.

The state file records this step as complete. If the script is interrupted during the 20-minute wait and resumed, the --start-from 4 flag skips this step (server is already deleted) and proceeds to creation.

Step 4 — Create New Database Server

Creates the new server with the target storage size, then applies all preserved configuration:

az postgres flexible-server create \
  --name <same name>          # DNS name is now available after 20min wait
  --sku-name <16-core SKU>    # high-core SKU for fast import
  --tier <original tier>
  --storage-size <TARGET_GB>  # the downsized allocation
  --version <original version>
  --subnet <original subnet>          # same VNet connectivity
  --private-dns-zone <original zone>  # same private DNS
  --admin-user system
  --admin-password <password>

After creation, the following are restored in order:

  1. Resource tagsaz resource tag
  2. azure.extensions — the extension whitelist must be set before any extension is installed
  3. shared_preload_libraries — e.g., timescaledb,pg_cron; requires a server restart
  4. Server restart — applies extension parameters
  5. Import performance parameters — set for the duration of the import:
Parameter Import Value Rationale
autovacuum off Prevents autovacuum from competing with bulk INSERT during restore
max_wal_size 32768 MB Reduces checkpoint frequency, fewer interruptions during bulk load
checkpoint_timeout 3600 s Checkpoints every hour instead of every 5 minutes
maintenance_work_mem 2048 MB More memory per index build in CREATE INDEX during restore
  1. Server restart — applies performance parameters

Step 5 — Import Database from Local Machine

Phase 5a: Grant Admin Role

GRANT azure_pg_admin TO system;

Required because post_optimization.sql uses SET ROLE to temporarily become the schema owner when granting USAGE privileges. azure_pg_admin membership is what allows the admin user to perform this in Azure's managed PostgreSQL environment.

Phase 5b: Apply Roles

psql -h <fqdn> -U system -d postgres -f _dumps/<name>/roles.sql

This replays all CREATE ROLE, ALTER ROLE, GRANT (memberships), and ALTER ROLE SET statements from the source server. Any errors (e.g., role already exists from a partial previous run) are logged but do not stop the process — psql continues to the next statement.

At this point, the new server has the same role structure and hashed passwords as the source.

Phase 5c: pg_restore

pg_restore \
  -h <fqdn> -U system -d postgres \
  -j <nproc, max 16>   # parallel workers
  --no-owner           # all objects restored as 'system'; post_opt handles ownership
  --verbose            # progress output to log
  _dumps/<name>/dbdump

--no-acl is NOT passed. Because roles now exist (from Phase 5b), all GRANT statements embedded in the dump are executed as-is. This correctly restores cross-schema access grants (e.g., one role having SELECT access on another schema's tables) without any manual specification.

--no-owner IS passed. In Azure PostgreSQL, pg_restore cannot execute ALTER TABLE ... OWNER TO <role> for roles other than the current user, even if the current user is azure_pg_admin. All objects are therefore restored as owned by system, and ownership is corrected dynamically by post_optimization.sql.

|| true at the end means pg_restore exit code is always treated as success. pg_restore exits non-zero even when individual object errors are non-fatal (e.g., trying to create an index that already exists). The verbose log captures all individual errors for review.

Phase 5d: Post-optimization SQL

Runs _optimize_scripts/post_optimization.sql (see Section 7) with up to 3 automatic retries. Retry is necessary because TimescaleDB and pg_cron background workers, loaded via shared_preload_libraries, can crash asynchronously when they observe rapid catalog changes (ALTER TABLE OWNER TO on many objects in quick succession). The crash takes down the psql connection with:

FATAL: terminating connection because of crash of another server process

The post-optimization script is idempotent — re-running it produces the same result.

Phase 5e: Restore Original Parameters and Scale Down SKU

Performance parameters are restored to their pre-optimization values, then either:

  • SKU was scaled up in Step 1: az postgres flexible-server update --sku-name <original> — this triggers a brief restart automatically.
  • SKU was not scaled up: az postgres flexible-server restart — applies the restored parameters.

Step 6 — Cleanup

Removes the local dump directory:

rm -rf _dumps/<server_name>/

This step is intentionally the last. The dump is retained until all import verification is complete (including post-optimization SQL). If anything fails in Step 5, the dump is available for a manual retry.


6. Role & Permission Management

This is the most operationally complex part of the process. PostgreSQL's security model requires careful ordering of operations on a fresh server.

The Core Problem

pg_dump exports object-level permissions (ACLs) as GRANT statements. When pg_restore replays these, the referenced roles must already exist. On a fresh server, no application roles exist — only the admin user.

Additionally, pg_restore with --no-owner assigns all objects to the restoring user (system). Application roles cannot own their schemas until the ownership is explicitly reassigned after the restore.

The Solution: roles.sql

The roles.sql file, generated during export, is applied to the new server before pg_restore. It contains:

-- 1. Role creation (from pg_dumpall --globals-only)
CREATE ROLE ice_imtx;
CREATE ROLE ice_infrastructure;
-- ...

-- 2. Role attributes and hashed passwords (merged from pg_shadow query)
ALTER ROLE ice_imtx WITH NOSUPERUSER INHERIT NOCREATEROLE NOCREATEDB
  LOGIN NOREPLICATION PASSWORD 'SCRAM-SHA-256$4096:<salt>:<hash>';
-- ...

-- 3. Role memberships (from pg_dumpall --globals-only)
GRANT ice_infrastructure TO ice_nontransaction;
GRANT ice_metadata TO ice_nontransaction;
-- ...

-- 4. Per-role configuration (from pg_dumpall --globals-only)
ALTER ROLE ice_imtx SET search_path TO 'ice_imtx', 'public';
ALTER ROLE ice_infrastructure SET search_path TO 'ice_infrastructure', 'ice_nontransaction', 'public';
-- ...

By applying roles.sql before pg_restore, the restore can:

  • Apply object-level ACLs (GRANT SELECT ON TABLE ... TO role) — roles exist now
  • Leave ownership as system (--no-owner) for post-optimization to fix

Dynamic Ownership Reassignment (post_optimization.sql)

After pg_restore, all schemas and their objects are owned by system. The post_optimization.sql script discovers all non-system schemas and, for each one, checks whether a role with the same name exists. If it does, that role becomes the schema's owner:

Schema: ice_imtx      →  Role: ice_imtx      ✓ (role exists)
Schema: ice_journey   →  Role: ice_journey   ✓ (role exists)
Schema: public        →  Skipped (excluded system schema)
Schema: _timescaledb  →  Skipped (extension-owned)

For each matching schema, the process is:

  1. Acquire USAGE from current schema owner (may require SET ROLE <owner> if the schema was preserved with its original owner in the dump)
  2. Grant membership of the target role to session_user (the admin login) — required for ALTER TABLE ... OWNER TO <role> in PostgreSQL
  3. Grant USAGE, CREATE on the schema to the target role
  4. call do_changeowner(role, schema) — iterates all tables, views, and functions in the schema and changes their owner
  5. ALTER SCHEMA ... OWNER TO <role> — transfers the schema namespace itself
  6. GRANT ALL ON ALL TABLES/SEQUENCES/FUNCTIONS IN SCHEMA ... TO <role> — ensures full access to current objects

This is fully generic — it works for any schema/role naming convention where the role name matches the schema name.


7. Pre & Post Optimization SQL

pre_optimization.sql

Runs on the source server before pg_dump. Intended for cleanup operations that reduce dump size.

Built-in operations:

  • ANALYZE — updates planner statistics for better export performance
  • Conditional DROP EXTENSION timescaledb CASCADE — TimescaleDB's internal tables (_timescaledb_catalog, _timescaledb_internal, etc.) add complexity to the dump and its background workers cause ownership-change crashes during restore. Dropping it before export produces a clean dump of the underlying data tables. It is reinstalled in post_optimization.sql.

Customisation hook: Add TRUNCATE TABLE <schema>.<table> CASCADE; statements for high-volume transient tables identified in the pre-assessment (Section 3.2). Each truncated table reduces the dump size and shortens the total operation time proportionally.

-- Example additions for a project-specific pre_optimization.sql:
TRUNCATE TABLE myschema.application_log CASCADE;
TRUNCATE TABLE myschema.event_message_param CASCADE;
TRUNCATE TABLE myschema.audit_trail CASCADE;

post_optimization.sql

Runs on the new server after pg_restore. It is structured in sections that must execute in a specific order:

GRANT azure_pg_admin TO system           ← must be first
do_changeowner() function creation       ← must precede ownership block
Dynamic ownership reassignment           ← must precede TimescaleDB install
pg_sleep(5)                              ← stabilise background workers
pg_cron extension                        ← optional, conditional
TimescaleDB extension                    ← must be last DDL operation
ANALYZE VERBOSE                          ← final step

Why TimescaleDB must be last:

TimescaleDB registers background workers (via shared_preload_libraries) that connect to the database and watch for hypertable catalog changes. If TimescaleDB is installed before do_changeowner() runs, those background workers see ALTER TABLE ... OWNER TO operations on hypertables and crash asynchronously. The crash terminates the psql connection mid-script with a fatal error. Installing TimescaleDB after all ownership changes means the background workers start with a fully consistent catalog.

Why pg_cron is installed before TimescaleDB:

pg_cron's background worker connects at startup and looks for the cron schema. If the schema is missing, older versions can crash — which would terminate the connection before TimescaleDB is installed. Installing pg_cron (which creates the cron schema) before pg_sleep ensures the worker stabilises before the next wave of DDL.


8. State Management & Resumability

State File Format

Every run maintains a JSON state file at _logs/state_<server_name>.json. It tracks completed steps and persists all retrieved server properties:

{
  "step_0": {"status": "complete", "completed_at": "2024-11-15 09:00:01"},
  "step_1": {"status": "complete", "completed_at": "2024-11-15 09:02:47"},
  "step_2": {"status": "complete", "completed_at": "2024-11-15 11:23:05"},
  "vars": {
    "pg_location": "eastus",
    "pg_sku_name": "Standard_D4ads_v5",
    "pg_sku_tier": "GeneralPurpose",
    "pg_version": "16",
    "pg_current_storage": "8192",
    "TARGET_STORAGE_GB": "1024",
    "SKU_SCALED_UP": "true",
    "pg_sku_16core": "Standard_D16ads_v5",
    "pg_subnet_id": "/subscriptions/.../subnets/...",
    ...
  }
}

The vars section allows full resumption from any step without re-querying Azure — critical for steps 4 and 5 where the source server no longer exists.

Resume Behaviour

When --start-from N is specified:

  1. Steps 0 through N-1 are marked complete in the state file
  2. Step N has its state entry deleted (so it runs fresh)
  3. Steps N+1 onward run normally (skipped if already complete)

Variables saved by step 1 are always loaded from the state file when step 1 is skipped, so all server properties are available regardless of entry point.

When to Use --start-from

Failure Scenario Resume Step Notes
pre_optimization.sql error --start-from 2 Source server still exists; fix the SQL first
pg_dump failed midway --start-from 2 Re-runs the entire export; previous partial dump is overwritten
pg_dump succeeded, delete failed --start-from 3 Source server still exists
During 20-minute wait after delete --start-from 4 Server already deleted; skip to creation
Server creation failed --start-from 4 Retry server creation
pg_restore failed --start-from 5 Dump still at _dumps/<name>/; check for space/connectivity
post_optimization.sql failed --start-from 5 pg_restore re-runs (idempotent for most operations)

9. Performance Tuning During Import

Import-Phase PostgreSQL Parameters

During Steps 4 and 5, the server is configured for maximum bulk-load throughput rather than normal OLTP operation:

autovacuum = off Autovacuum daemon is disabled during import. During a bulk load, the tables are written to sequentially and autovacuum would repeatedly interrupt to process dead tuples from multi-pass inserts. Disabling it eliminates this contention. Re-enabled (to its original value) after import.

max_wal_size = 32768 MB Controls how large the WAL (Write-Ahead Log) can grow before a checkpoint is forced. The default (typically 1–2GB) causes frequent checkpoints during bulk import, pausing all writes briefly on each one. Setting it to 32GB means checkpoints happen much less frequently, allowing the import to proceed without interruption.

checkpoint_timeout = 3600 s Ensures checkpoints happen at most every hour, further reducing checkpoint frequency during the import phase.

maintenance_work_mem = 2048 MB Each CREATE INDEX operation during pg_restore gets up to 2GB of working memory. Index creation is typically the longest phase of a pg_restore. More memory per build means faster sort operations and fewer disk spills. pg_restore -j N runs N index builds in parallel, so the total memory consumption is N × maintenance_work_mem — size this based on available RAM.

Parallel Workers

pg_dump and pg_restore both use -j <PG_THREADS> for parallelism. The thread count is auto-detected:

PG_THREADS = min(nproc, 16)

nproc returns the number of logical CPUs on the executing machine. Capped at 16 because PostgreSQL's parallel utility workers show diminishing returns beyond ~16 connections for typical database sizes, and higher concurrency increases lock contention on the target server.


10. Failure Scenarios & Recovery

Scenario A: Failure Before Step 3 (Source Still Exists)

The source server is untouched. You can fix any issue and resume, or simply restart from scratch by deleting the state file:

rm _logs/state_<server_name>.json
./optimize_db_storage.sh <params>

Scenario B: Failure During Step 3 (Delete)

If the delete command itself fails, the source server may or may not still exist. Check:

az postgres flexible-server show \
  --subscription <sub> --resource-group <rg> --name <server> \
  --query "{name:name, state:state}" -o json

If the server still exists, resume from --start-from 3. If it no longer exists, resume from --start-from 4.

Scenario C: Failure After Step 3 (Source Deleted, Target Not Created)

This is the highest-risk scenario. The source server is gone but the dump is present. Do not panic — the dump is all that is needed. Resume:

./optimize_db_storage.sh <params> --start-from 4

The state file contains all server properties needed to recreate the server correctly without re-querying Azure.

Scenario D: Failure During Step 5 (Import)

The new server exists but the import is incomplete or failed. The dump at _dumps/<name>/ is still intact. Check whether roles were applied:

PGPASSWORD="<password>" psql -h <fqdn> -U system -d postgres \
  -c "SELECT rolname FROM pg_roles WHERE rolname NOT LIKE 'pg_%' AND rolname NOT LIKE 'azure%' ORDER BY rolname;"

Resume from --start-from 5. pg_restore with --no-owner is largely idempotent for CREATE TABLE / INSERT operations (duplicate key violations on already-restored tables appear in the log but do not abort the restore).

Scenario E: post_optimization.sql Failed After pg_restore

The data is fully restored but ownership and grants may be incomplete. Resume from --start-from 5pg_restore will produce some errors for already-existing objects but post_optimization.sql will run cleanly on the second attempt.

Inspecting the State File

# View all completed steps
jq '.[] | select(.status=="complete")' _logs/state_<server_name>.json

# View a specific saved variable
jq '.vars.pg_sku_name' _logs/state_<server_name>.json

# View all saved variables
jq '.vars' _logs/state_<server_name>.json

11. Requirements & Prerequisites

Azure Permissions

The principal running this script (user or service principal) requires:

Permission Scope Purpose
Microsoft.DBforPostgreSQL/flexibleServers/read Server Steps 1, 4
Microsoft.DBforPostgreSQL/flexibleServers/write Resource Group Steps 1 (SKU update), 4 (create)
Microsoft.DBforPostgreSQL/flexibleServers/delete Server Step 3
Microsoft.DBforPostgreSQL/flexibleServers/configurations/write Server Steps 4, 5
Microsoft.DBforPostgreSQL/flexibleServers/restart/action Server Steps 4, 5
Microsoft.Resources/tags/write Server Step 4 (tag restoration)

A role with Contributor on the resource group satisfies all of these.

Local Machine Requirements

Requirement Notes
Azure CLI (az) Must be authenticated (az login or service principal); subscription context set
psql PostgreSQL client; version should be ≥ the server version
pg_dump Same version constraints as psql
pg_restore Same version constraints as psql
pg_dumpall Used for role export; part of standard PostgreSQL client package
jq State file management; 1.5+
Network connectivity Direct connectivity to the PostgreSQL private endpoint
Disk space ≥ dump size + 15% overhead in the script's working directory

Important: If the PostgreSQL server is on a private endpoint (typical in production-like environments), the executing machine must be on the same VNet or connected via VPN/ExpressRoute. The script connects to <server_name>.postgres.database.azure.com, which resolves to the private IP in a private DNS zone.

PostgreSQL Server Requirements

  • PostgreSQL version: 11–17 supported
  • Admin user: system (configurable via PG_ADMIN_USER constant in the script)
  • The admin user must have azure_pg_admin membership (default for the first admin user on Flexible Server)

12. Usage Reference

Main Script

./optimize_db_storage.sh \
  <subscription_id> \
  <resource_group> \
  <postgresql_name> \
  <pg_system_password> \
  [--start-from <step>]

The wrapper validates that the PostgreSQL server exists before invoking the main script:

./optimize_db_storage_wrapper.sh \
  <subscription_id> \
  <resource_group> \
  <postgresql_name> \
  <pg_system_password> \
  [--start-from <step>]

Examples

# Basic run
./optimize_db_storage_wrapper.sh \
  "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
  "rg-qa" \
  "qadb" \
  "MyP@ssw0rd!"

# Resume from import step after a failure
./optimize_db_storage_wrapper.sh \
  "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
  "rg-qa" \
  "qadb" \
  "MyP@ssw0rd!" \
  --start-from 5

# Customise parallelism (edit PG_THREADS constant at top of main script
# or export before running if you want fewer threads on a shared machine)

Step Reference

Step Name Safe to Retry Source Server
0 Parse parameters Exists
1 Retrieve PG details & scale up SKU Exists
2 Export database (local pg_dump) Exists
3 Delete source server ✓ (idempotent) Deleted
4 Create new database server Deleted
5 Import database (pg_restore + post SQL) Deleted
6 Cleanup dump directory Deleted

Output Files

Path Description
_logs/storage_opt_<timestamp>.log Full execution log (all commands and output)
_logs/state_<server_name>.json Resumable state (removed on successful completion)
_dumps/<server_name>/roles.sql Exported role definitions with password hashes
_dumps/<server_name>/dbdump/ pg_dump directory-format output (removed at step 6)

13. Troubleshooting Guide

pg_dump fails with "connection refused" or "no route to host"

The executing machine cannot reach the PostgreSQL private endpoint.

# Test connectivity
psql -h <server>.postgres.database.azure.com -U system -d postgres -c "SELECT 1;"

# Check if the private DNS resolves
nslookup <server>.postgres.database.azure.com

Ensure the machine is on the same VNet or peered VNet, or connected via VPN/ExpressRoute. Check Network Security Group rules for port 5432.

Role statements fail with "role already exists"

Expected and harmless. psql logs the error and continues. This happens on retries when some roles were already created in a previous attempt.

pg_restore reports errors on individual objects

A non-zero pg_restore exit code with individual object errors is normal. Common sources:

  • ERROR: relation "..." already exists — table was already restored in a previous partial run
  • ERROR: role "..." does not exist — a grant references a role that was filtered from roles.sql; review the filter patterns if the role is an application role

Review _logs/storage_opt_*.log for the full list of errors.

post_optimization.sql fails with "crash of another server process"

TimescaleDB or pg_cron background workers crashed during the ownership-change DDL. The script automatically retries up to 3 times with a 30-second wait between attempts. If all 3 attempts fail, run manually after the server stabilises:

PGPASSWORD="<password>" psql \
  -h <server>.postgres.database.azure.com \
  -U system -d postgres \
  -f _optimize_scripts/post_optimization.sql

Server creation fails with "name already in use"

The 20-minute DNS release window after deletion has not elapsed, or a previous run deleted the server and the wait was interrupted. Wait and retry:

./optimize_db_storage.sh <params> --start-from 4

Storage target is not reducing costs enough

The dynamic target storage calculation defaults to 1TB for all sizes other than 8TB→1TB or 2TB→512GB. To set a custom target, modify the calculation block in Step 1 of optimize_db_storage.sh:

# Example: target 512GB for any server with < 1TB actual data
if [ "${pg_current_storage}" -eq 8192 ]; then
    TARGET_STORAGE_GB=512    # was 1024
elif ...

Note: The target size must be ≥ your actual data size plus a safety buffer (at minimum 20% headroom for indexes, WAL, autovacuum, and growth).

Checking post-optimization role fidelity

After Step 5 completes, verify that all roles were recreated correctly:

# Compare roles on new server against pre-assessment output (Section 3.4)
PGPASSWORD="<password>" psql \
  -h <server>.postgres.database.azure.com \
  -U system -d postgres \
  -c "SELECT rolname, rolcanlogin, rolsuper FROM pg_roles
      WHERE rolname NOT LIKE 'pg_%' AND rolname NOT LIKE 'azure%'
      ORDER BY rolname;"

# Verify schema ownership
PGPASSWORD="<password>" psql \
  -h <server>.postgres.database.azure.com \
  -U system -d postgres \
  -c "SELECT n.nspname AS schema, pg_get_userbyid(n.nspowner) AS owner
      FROM pg_namespace n
      WHERE n.nspname NOT LIKE 'pg_%' AND n.nspname NOT IN ('information_schema', 'public')
      ORDER BY n.nspname;"

  • Azure PostgreSQL Storage Reduction & Optimization - Technical Deep Dive · April 2026 *


AI Disclosure: I leverage AI to help draft, structure, and refine these posts. However, all technical insights, architectural decisions, and final reviews are handled personally to ensure accuracy and real-world reliability.