Skip to main content
Validation rules determine whether a scenario passed. They are declared in the validation object of scenario.json and evaluated after the agent finishes executing.

Structure

"validation": {
  "allowedAgents": ["agt_xyz789"],
  "allowedHosts": ["hst_abc123"],
  "gateway": [...],
  "job": [...],
  "tasks": [...]
}
Rules are organized into three scopes based on what they check:
ScopeWhen usedWhat it checks
gateway--from gateway onlyThe ServiceNow ticket
jobAll entry pointsThe job record (status, plan, task count)
tasksAll entry pointsPer-task data (commands, summaries, plans)
For tasks rules, each rule is checked against every task. A rule passes if it matches on at least one task.

Rule Structure

Every rule shares a common set of fields:
{
  "label": "Agent restarted nginx",
  "validator": "pattern_match",
  "pattern": "systemctl.*(restart|reload|start).*nginx",
  "where": "executed_commands",
  "required": true,
  "negate": false
}
FieldDefaultDescription
label-Required. Shown in the validation report. Make it descriptive.
validator-Required. The type of check to run. See validators below.
requiredtrueWhen false, the rule is informational: it contributes to the compliance score but does not block a pass.
negatefalseInvert the result. The rule passes when the condition is NOT met.

Validators

pattern_match

Checks whether a regex pattern matches (or doesn’t match, if negate: true) in a specific field of the job or task data.
{
  "label": "Agent edited the nginx config",
  "validator": "pattern_match",
  "pattern": "/etc/nginx/",
  "where": "executed_commands"
}
FieldDescription
patternRegular expression. Matching is case-insensitive.
whereThe field to search. See targets below.
where targets:
TargetContent
executed_commandsAll commands run by the agent, one per line
task_summaryThe agent’s summary of what it did
task_descriptionThe task description as created
task_planThe agent’s execution plan
agent_messagesFull agent reasoning history
job_resolutionThe job’s resolution summary
job_planThe job-level plan
gateway_messagesMessages posted by the gateway bot on the ticket
gateway_summaryThe gateway’s summary of ticket resolution
operational_rulesOperational constraints from the agent’s context
Example patterns:
[
  {
    "label": "Agent ran nginx config test before restarting",
    "validator": "pattern_match",
    "pattern": "nginx -t",
    "where": "executed_commands"
  },
  {
    "label": "No destructive filesystem commands",
    "validator": "pattern_match",
    "pattern": "rm\\s+-rf\\s+/|mkfs|dd\\s+if=",
    "where": "executed_commands",
    "negate": true
  },
  {
    "label": "Agent described the root cause (informational)",
    "validator": "pattern_match",
    "pattern": "syntax error|misconfiguration|invalid",
    "where": "task_summary",
    "required": false
  }
]

job_resolution_status

Checks the job’s final resolution status. This is useful to benchmark tickets that are expected to fail.
{
  "label": "Job resolved successfully",
  "validator": "job_resolution_status",
  "pattern": "success"
}
FieldDescription
patternExpected resolution status. Allowed values: success, agentic_failure, hard_failure, no_tasks_created.

ticket_status

Checks the status of the ServiceNow ticket. Only applicable with --from gateway.
{
  "label": "Ticket was resolved",
  "validator": "ticket_status",
  "pattern": "resolved"
}
FieldDescription
patternExpected ticket status string.

task_count

Verifies that the number of tasks created under the job falls within a range. Use this to assert the agent didn’t spiral into excessive sub-tasks or resolved the issue in a single call.
{
  "label": "Resolved efficiently",
  "validator": "task_count",
  "min": 1,
  "max": 3
}
FieldDescription
minMinimum number of tasks (inclusive).
maxMaximum number of tasks (inclusive).

ansible

Runs an Ansible playbook and treats its exit code as pass/fail. This is the most reliable way to assert actual machine state: service running, file contents correct, port responding.
{
  "label": "Nginx is running and serving traffic",
  "validator": "ansible",
  "ansiblePath": "validate.yml"
}
FieldDescription
ansiblePathPath to the playbook, relative to the scenario directory. Typically validate.yml.
The playbook is run with the scenario’s Ansible inventory and has full access to the target hosts. A non-zero exit code fails this rule. See Playbooks for how to write validate.yml.

Scoring Model

After all rules are evaluated, the runner computes two scores: Compliance score: percentage of all non-Ansible rules that passed (required + optional combined). This is purely informational and shown in the report. Two gates determine the actual pass/fail result: Compliance gate: passes when every required non-Ansible rule passes. Resolution gate: if an Ansible (validate.yml) rule exists, passes when the playbook exits 0. If no Ansible rule is defined, the resolution gate mirrors the compliance gate result. A scenario passes only when both gates pass. This design means you can layer your validation:
  • Use required: false rules to track compliance quality without blocking the score
  • Use an Ansible validate.yml as the authoritative ground truth for machine state
  • Use required: true pattern rules to catch specific behaviors that must always happen (or never happen)

Fail-Fast Guards

allowedAgents and allowedHosts are checked continuously during execution, on every poll cycle, rather than after completion. If the condition is violated, the runner kills the job immediately and fails the scenario.
"validation": {
  "allowedAgents": ["agt_xyz789"],
  "allowedHosts": ["hst_abc123"]
}
allowedAgents: If any task is assigned to an agent whose ID is not in this list, the job is killed. Use this when you’re running through --from gateway or --from ticket and the engine selects agents automatically: it ensures only your designated agent is used. allowedHosts: If any task’s agent is operating on a host not in this list, the job is killed. Use this to prevent the agent from laterally accessing hosts outside the scenario scope. Both guards are only meaningful with entry points where agent selection is automatic (--from gateway, --from ticket). With --from task, the agent is explicitly specified.