Skip to content

QA Team

The QA team (benchmark agent) runs full test suites and benchmarks. Distinct from the code-reviewer's small regression test with early stop.

Regression Thresholds

Configurable in project-config.md. Defaults (shared with code-reviewer for consistency):

Level Default threshold
Watch > +2%
Regressed > +5%
Critical > +10%

Verdicts

Verdict Condition
PASS No metric in Regressed or Critical range
WATCH Some Watch, none Regressed
REGRESSED Any metric in Regressed range
CRITICAL Any metric in Critical range or new failure

CRITICAL triggers HALT (see Rollback Protocol). Baselines are never silently updated---always require user approval.