Learning path
SRE On-Call Path
Operate Postgres under fire: diagnose, mitigate, fail over and recover.
SRE, on-call · Advanced · 4 courses · 13 simulations
The path
Courses, then incidents
Work through the courses, then practise the incidents — each step links to its page.
- Course Finding the Heavy Queries →
- Course Who Is Doing What →
- Course Streaming Replication and Failover →
- Course WAL Archiving and Point-in-Time Recovery →
- ha-failover Primary crash detection →
- ha-failover Manual replica promotion →
- ha-failover Failed failover due to lag →
- ha-failover Split-brain risk →
- ha-failover Old primary returns →
- ha-failover Read/write endpoint confusion →
- ha-failover Failover with slot cleanup →
- ha-failover Timeline divergence detected →
- ha-failover Backup before rejoin →
- ha-failover Post-failover validation →
- connections-pooling Connection storm after deploy →
- replication-wal Replica lag from a stopped replica →
- compound-incidents Checkout slow query and connection storm →