Validation - 2501.ai

Validation rules determine whether a scenario passed. They are declared in the validation object of scenario.json and evaluated after the agent finishes executing.

Structure

"validation": {
  "allowedAgents": ["agt_xyz789"],
  "allowedHosts": ["hst_abc123"],
  "gateway": [...],
  "job": [...],
  "tasks": [...]
}

Rules are organized into three scopes based on what they check:

Scope	When used	What it checks
`gateway`	`--from gateway` only	The ServiceNow ticket
`job`	All entry points	The job record (status, plan, task count)
`tasks`	All entry points	Per-task data (commands, summaries, plans)

For tasks rules, each rule is checked against every task. A rule passes if it matches on at least one task.

Rule Structure

Every rule shares a common set of fields:

{
  "label": "Agent restarted nginx",
  "validator": "pattern_match",
  "pattern": "systemctl.*(restart|reload|start).*nginx",
  "where": "executed_commands",
  "required": true,
  "negate": false
}

Field	Default	Description
`label`	-	Required. Shown in the validation report. Make it descriptive.
`validator`	-	Required. The type of check to run. See validators below.
`required`	`true`	When `false`, the rule is informational: it contributes to the compliance score but does not block a pass.
`negate`	`false`	Invert the result. The rule passes when the condition is NOT met.

Validators

`pattern_match`

Checks whether a regex pattern matches (or doesn’t match, if negate: true) in a specific field of the job or task data.

{
  "label": "Agent edited the nginx config",
  "validator": "pattern_match",
  "pattern": "/etc/nginx/",
  "where": "executed_commands"
}

Field	Description
`pattern`	Regular expression. Matching is case-insensitive.
`where`	The field to search. See targets below.

where targets:

Target	Content
`executed_commands`	All commands run by the agent, one per line
`task_summary`	The agent’s summary of what it did
`task_description`	The task description as created
`task_plan`	The agent’s execution plan
`agent_messages`	Full agent reasoning history
`job_resolution`	The job’s resolution summary
`job_plan`	The job-level plan
`gateway_messages`	Messages posted by the gateway bot on the ticket
`gateway_summary`	The gateway’s summary of ticket resolution
`operational_rules`	Operational constraints from the agent’s context

Example patterns:

[
  {
    "label": "Agent ran nginx config test before restarting",
    "validator": "pattern_match",
    "pattern": "nginx -t",
    "where": "executed_commands"
  },
  {
    "label": "No destructive filesystem commands",
    "validator": "pattern_match",
    "pattern": "rm\\s+-rf\\s+/|mkfs|dd\\s+if=",
    "where": "executed_commands",
    "negate": true
  },
  {
    "label": "Agent described the root cause (informational)",
    "validator": "pattern_match",
    "pattern": "syntax error|misconfiguration|invalid",
    "where": "task_summary",
    "required": false
  }
]

`job_resolution_status`

Checks the job’s final resolution status. This is useful to benchmark tickets that are expected to fail.

{
  "label": "Job resolved successfully",
  "validator": "job_resolution_status",
  "pattern": "success"
}

Field	Description
`pattern`	Expected resolution status. Allowed values: `success`, `agentic_failure`, `hard_failure`, `no_tasks_created`.

`ticket_status`

Checks the status of the ServiceNow ticket. Only applicable with --from gateway.

{
  "label": "Ticket was resolved",
  "validator": "ticket_status",
  "pattern": "resolved"
}

Field	Description
`pattern`	Expected ticket status string.

`task_count`

Verifies that the number of tasks created under the job falls within a range. Use this to assert the agent didn’t spiral into excessive sub-tasks or resolved the issue in a single call.

{
  "label": "Resolved efficiently",
  "validator": "task_count",
  "min": 1,
  "max": 3
}

Field	Description
`min`	Minimum number of tasks (inclusive).
`max`	Maximum number of tasks (inclusive).

`ansible`

Runs an Ansible playbook and treats its exit code as pass/fail. This is the most reliable way to assert actual machine state: service running, file contents correct, port responding.

{
  "label": "Nginx is running and serving traffic",
  "validator": "ansible",
  "ansiblePath": "validate.yml"
}

Field	Description
`ansiblePath`	Path to the playbook, relative to the scenario directory. Typically `validate.yml`.

The playbook is run with the scenario’s Ansible inventory and has full access to the target hosts. A non-zero exit code fails this rule. See Playbooks for how to write validate.yml.

Scoring Model

After all rules are evaluated, the runner computes two scores: Compliance score: percentage of all non-Ansible rules that passed (required + optional combined). This is purely informational and shown in the report. Two gates determine the actual pass/fail result: Compliance gate: passes when every required non-Ansible rule passes. Resolution gate: if an Ansible (validate.yml) rule exists, passes when the playbook exits 0. If no Ansible rule is defined, the resolution gate mirrors the compliance gate result. A scenario passes only when both gates pass. This design means you can layer your validation:

Use required: false rules to track compliance quality without blocking the score
Use an Ansible validate.yml as the authoritative ground truth for machine state
Use required: true pattern rules to catch specific behaviors that must always happen (or never happen)

Fail-Fast Guards

allowedAgents and allowedHosts are checked continuously during execution, on every poll cycle, rather than after completion. If the condition is violated, the runner kills the job immediately and fails the scenario.

"validation": {
  "allowedAgents": ["agt_xyz789"],
  "allowedHosts": ["hst_abc123"]
}

allowedAgents: If any task is assigned to an agent whose ID is not in this list, the job is killed. Use this when you’re running through --from gateway or --from ticket and the engine selects agents automatically: it ensures only your designated agent is used. allowedHosts: If any task’s agent is operating on a host not in this list, the job is killed. Use this to prevent the agent from laterally accessing hosts outside the scenario scope. Both guards are only meaningful with entry points where agent selection is automatic (--from gateway, --from ticket). With --from task, the agent is explicitly specified.

​Structure

​Rule Structure

​Validators

​pattern_match

​job_resolution_status

​ticket_status

​task_count

​ansible

​Scoring Model

​Fail-Fast Guards