Scoring & postmortems
When you think you have resolved an incident, run score:
pgpg[scenario]> score
score runs the deterministic, offline scorer. It inspects the live sandbox — your changes, the current schema, and how the database behaves under the scenario's original workload — and produces a scorecard out of 100 plus a short postmortem. No network, no model, no randomness.
The three dimensions
Your score breaks down into three independent dimensions:
- Detect — Did you find the real root cause? Treating a symptom (killing one slow query, bumping a timeout) scores low here. Identifying and addressing the actual mechanism scores high.
- Fix — Does your fix hold under the original workload? The scorer re-runs the workload that caused the incident. A fix that resolves the symptom momentarily but collapses again under load does not earn full Fix points.
- Trap — Did you avoid dangerous production traps? Many "fixes" that work in a quiet sandbox would cause an outage in production — taking an exclusive lock on a hot table, rewriting a large table under live traffic, dropping an index other queries depend on. Falling into these costs Trap points even if Detect and Fix look good.
A strong score means you found the cause, your fix survives the real workload, and you would have been safe doing it in production.
A scorecard
Scenario: stage-04/02-name
Hints used: 1
Detect 38 / 40 Root cause identified: idle-in-transaction
session holding a lock on orders.
Fix 34 / 40 Workload replayed cleanly; no further blocking.
Trap 15 / 20 CREATE INDEX without CONCURRENTLY would have
locked a hot table in production.
------------------------------------------------
Total 87 / 100
Postmortem
The incident was a session left idle in transaction after a
failed deploy, holding a row lock that queued every write behind
it. You correctly terminated the blocker and added the missing
index. In production, build that index with CONCURRENTLY to
avoid an exclusive lock on a table under live write traffic.
The postmortem explains what happened, what you did, and — when you lost points — what the production-safe version of your fix would have been.
Reproducible by design
Scoring is deterministic: the same inputs always produce the same score, for everyone. Given the same scenario and the same final database state, the scorer returns the identical Detect / Fix / Trap breakdown and total every time. There is no variance between runs, machines, or users, which makes scores directly comparable and lets you re-run a scenario to confirm a real improvement rather than noise.
AI never scores
To be explicit: AI plays no part in scoring. Models generate hint wording and explanations only. Your grade comes entirely from the deterministic scorer examining the database — never from a model's opinion. That is what makes a pgpg score something you can trust and compare.