← All bricks

PostgreSQL · Advanced

Streaming Replication and Failover

Build a real streaming-replication cluster in Docker — two Postgres nodes, live WAL shipping, measurable lag, promote to failover, and logical replication for selective table sync — so you understand the mechanics before any orchestrator abstracts them away.

Most engineers operate replication through an orchestrator and never look at what is happening underneath. This course strips the abstraction away. We build a two-node cluster from scratch — primary and replica in Docker, connected by streaming WAL — and measure what is actually happening at every layer. We watch pg_stat_replication report the replica's lag in real time, insert on the primary and immediately read on the standby, then cut the primary and promote the replica into a writable node. We follow that with logical replication: a publication on one table, a subscription on a separate cluster, and rows flowing across in seconds. Everything runs on a real Postgres stand with real data; nothing is simulated.

What you'll build

  • Configure wal_level, max_wal_senders, and a replication slot from scratch
  • Build a standby with pg_basebackup -R and verify it enters streaming state
  • Read sent_lsn, write_lsn, flush_lsn, replay_lsn and explain what each measures
  • Observe and measure replication lag under write load
  • Promote a standby with pg_promote() and understand the split-brain risk
  • Create a publication and subscription for logical replication of a single table

Contents

  1. Two nodes, one goal
  2. Two services in one compose file
  3. WAL settings that make replication possible
  4. Who can connect — and as what role
  5. Mount the config into the container
  6. Start both containers
  7. Connect to the primary
  8. A dedicated role for replication
  9. Two views that show the replication state
  10. Both views are empty — that is correct
  11. A slot to protect the replica's WAL position
  12. The slot appears with active = false
  13. Two tables, two purposes
  14. Fifty thousand orders, deterministic
  15. Create the tables on the primary
  16. Load the orders
  17. Clone the primary with pg_basebackup
  18. Seed the replica from the primary
  19. Read from the replica
  20. Confirming standby state from the inside
  21. Watching the replica from the primary
  22. Insert on the primary, read on the replica
  23. The new row is on the replica
  24. The replica refuses writes
  25. A query to measure lag precisely
  26. Generating a burst of writes
  27. Catching the replica mid-stream
  28. Lag drains when writes stop
  29. Promoting the replica — and why order matters
  30. Executing the failover
  31. The standby is now writable
  32. Logical replication — a different model
  33. A separate cluster for the logical subscriber
  34. Create the publication on the primary
  35. Schema must exist on the subscriber
  36. Connect the subscriber to the publisher
  37. Data flows from publisher to subscriber
  38. Three rows on the subscriber
  39. Inspecting the subscription internals
  40. A recovery checklist as live queries
  41. Baseline before the accidental delete
  42. Logical dump before the drill
  43. The accidental delete
  44. Restoring from the logical dump
  45. Verifying the restore
  46. Cleaning up the stand