Playground (pgpg)

Scoring & postmortems

When you think you have resolved an incident, run score:

pgpg[scenario]> score

score runs the deterministic, offline scorer. It inspects the live sandbox — your changes, the current schema, and how the database behaves under the scenario's original workload — and produces a scorecard out of 100 plus a short postmortem. No network, no model, no randomness.

The three dimensions

Your score breaks down into three independent dimensions:

  • Detect — Did you find the real root cause? Treating a symptom (killing one slow query, bumping a timeout) scores low here. Identifying and addressing the actual mechanism scores high.
  • Fix — Does your fix hold under the original workload? The scorer re-runs the workload that caused the incident. A fix that resolves the symptom momentarily but collapses again under load does not earn full Fix points.
  • Trap — Did you avoid dangerous production traps? Many "fixes" that work in a quiet sandbox would cause an outage in production — taking an exclusive lock on a hot table, rewriting a large table under live traffic, dropping an index other queries depend on. Falling into these costs Trap points even if Detect and Fix look good.

A strong score means you found the cause, your fix survives the real workload, and you would have been safe doing it in production.

A scorecard

Scenario: stage-04/02-name
Hints used: 1

  Detect   38 / 40   Root cause identified: idle-in-transaction
                     session holding a lock on orders.
  Fix      34 / 40   Workload replayed cleanly; no further blocking.
  Trap     15 / 20   CREATE INDEX without CONCURRENTLY would have
                     locked a hot table in production.
  ------------------------------------------------
  Total    87 / 100

Postmortem
  The incident was a session left idle in transaction after a
  failed deploy, holding a row lock that queued every write behind
  it. You correctly terminated the blocker and added the missing
  index. In production, build that index with CONCURRENTLY to
  avoid an exclusive lock on a table under live write traffic.

The postmortem explains what happened, what you did, and — when you lost points — what the production-safe version of your fix would have been.

Reproducible by design

Scoring is deterministic: the same inputs always produce the same score, for everyone. Given the same scenario and the same final database state, the scorer returns the identical Detect / Fix / Trap breakdown and total every time. There is no variance between runs, machines, or users, which makes scores directly comparable and lets you re-run a scenario to confirm a real improvement rather than noise.

AI never scores

To be explicit: AI plays no part in scoring. Models generate hint wording and explanations only. Your grade comes entirely from the deterministic scorer examining the database — never from a model's opinion. That is what makes a pgpg score something you can trust and compare.