PostgreSQL · Advanced

Streaming Replication and Failover

Build a real streaming-replication cluster in Docker — two Postgres nodes, live WAL shipping, measurable lag, promote to failover, and logical replication for selective table sync — so you understand the mechanics before any orchestrator abstracts them away.

Most engineers operate replication through an orchestrator and never look at what is happening underneath. This course strips the abstraction away. We build a two-node cluster from scratch — primary and replica in Docker, connected by streaming WAL — and measure what is actually happening at every layer. We watch pg_stat_replication report the replica's lag in real time, insert on the primary and immediately read on the standby, then cut the primary and promote the replica into a writable node. We follow that with logical replication: a publication on one table, a subscription on a separate cluster, and rows flowing across in seconds. Everything runs on a real Postgres stand with real data; nothing is simulated.

What you'll build

Configure wal_level, max_wal_senders, and a replication slot from scratch
Build a standby with pg_basebackup -R and verify it enters streaming state
Read sent_lsn, write_lsn, flush_lsn, replay_lsn and explain what each measures
Observe and measure replication lag under write load
Promote a standby with pg_promote() and understand the split-brain risk
Create a publication and subscription for logical replication of a single table

Two nodes, one goal
Two services in one compose file
WAL settings that make replication possible
Who can connect — and as what role
Mount the config into the container
Start both containers
Connect to the primary
A dedicated role for replication
Two views that show the replication state
Both views are empty — that is correct
A slot to protect the replica's WAL position
The slot appears with active = false
Two tables, two purposes
Fifty thousand orders, deterministic
Create the tables on the primary
Load the orders
Clone the primary with pg_basebackup
Seed the replica from the primary
Read from the replica
Confirming standby state from the inside
Watching the replica from the primary
Insert on the primary, read on the replica
The new row is on the replica
The replica refuses writes
A query to measure lag precisely
Generating a burst of writes
Catching the replica mid-stream
Lag drains when writes stop
Promoting the replica — and why order matters
Executing the failover
The standby is now writable
Logical replication — a different model
A separate cluster for the logical subscriber
Create the publication on the primary
Schema must exist on the subscriber
Connect the subscriber to the publisher
Data flows from publisher to subscriber
Three rows on the subscriber
Inspecting the subscription internals
A recovery checklist as live queries
Baseline before the accidental delete
Logical dump before the drill
The accidental delete
Restoring from the logical dump
Verifying the restore
Cleaning up the stand

Streaming Replication and Failover

What you'll build

Contents