validation object of scenario.json and evaluated after the agent finishes executing.
Structure
| Scope | When used | What it checks |
|---|---|---|
gateway | --from gateway only | The ServiceNow ticket |
job | All entry points | The job record (status, plan, task count) |
tasks | All entry points | Per-task data (commands, summaries, plans) |
tasks rules, each rule is checked against every task. A rule passes if it matches on at least one task.
Rule Structure
Every rule shares a common set of fields:| Field | Default | Description |
|---|---|---|
label | - | Required. Shown in the validation report. Make it descriptive. |
validator | - | Required. The type of check to run. See validators below. |
required | true | When false, the rule is informational: it contributes to the compliance score but does not block a pass. |
negate | false | Invert the result. The rule passes when the condition is NOT met. |
Validators
pattern_match
Checks whether a regex pattern matches (or doesn’t match, if negate: true) in a specific field of the job or task data.
| Field | Description |
|---|---|
pattern | Regular expression. Matching is case-insensitive. |
where | The field to search. See targets below. |
where targets:
| Target | Content |
|---|---|
executed_commands | All commands run by the agent, one per line |
task_summary | The agent’s summary of what it did |
task_description | The task description as created |
task_plan | The agent’s execution plan |
agent_messages | Full agent reasoning history |
job_resolution | The job’s resolution summary |
job_plan | The job-level plan |
gateway_messages | Messages posted by the gateway bot on the ticket |
gateway_summary | The gateway’s summary of ticket resolution |
operational_rules | Operational constraints from the agent’s context |
job_resolution_status
Checks the job’s final resolution status. This is useful to benchmark tickets that are expected to fail.
| Field | Description |
|---|---|
pattern | Expected resolution status. Allowed values: success, agentic_failure, hard_failure, no_tasks_created. |
ticket_status
Checks the status of the ServiceNow ticket. Only applicable with --from gateway.
| Field | Description |
|---|---|
pattern | Expected ticket status string. |
task_count
Verifies that the number of tasks created under the job falls within a range. Use this to assert the agent didn’t spiral into excessive sub-tasks or resolved the issue in a single call.
| Field | Description |
|---|---|
min | Minimum number of tasks (inclusive). |
max | Maximum number of tasks (inclusive). |
ansible
Runs an Ansible playbook and treats its exit code as pass/fail. This is the most reliable way to assert actual machine state: service running, file contents correct, port responding.
| Field | Description |
|---|---|
ansiblePath | Path to the playbook, relative to the scenario directory. Typically validate.yml. |
validate.yml.
Scoring Model
After all rules are evaluated, the runner computes two scores: Compliance score: percentage of all non-Ansible rules that passed (required + optional combined). This is purely informational and shown in the report. Two gates determine the actual pass/fail result: Compliance gate: passes when everyrequired non-Ansible rule passes.
Resolution gate: if an Ansible (validate.yml) rule exists, passes when the playbook exits 0. If no Ansible rule is defined, the resolution gate mirrors the compliance gate result.
A scenario passes only when both gates pass.
This design means you can layer your validation:
- Use
required: falserules to track compliance quality without blocking the score - Use an Ansible
validate.ymlas the authoritative ground truth for machine state - Use
required: truepattern rules to catch specific behaviors that must always happen (or never happen)
Fail-Fast Guards
allowedAgents and allowedHosts are checked continuously during execution, on every poll cycle, rather than after completion. If the condition is violated, the runner kills the job immediately and fails the scenario.
allowedAgents: If any task is assigned to an agent whose ID is not in this list, the job is killed. Use this when you’re running through --from gateway or --from ticket and the engine selects agents automatically: it ensures only your designated agent is used.
allowedHosts: If any task’s agent is operating on a host not in this list, the job is killed. Use this to prevent the agent from laterally accessing hosts outside the scenario scope.
Both guards are only meaningful with entry points where agent selection is automatic (--from gateway, --from ticket). With --from task, the agent is explicitly specified.
