Run Scenarios

Execution Flow

Understanding what the runner does on each invocation helps when writing scenarios and debugging failures. Every 2501-runner run call goes through these phases:

1. Pre-flight

Before any scenario executes, the runner validates the full environment:

Connects to the database and verifies ORG_ID, TENANT_ID, USER_ID exist
If --from gateway: tests ServiceNow API connectivity
If --from ticket: tests engine API connectivity
Verifies ansible-playbook is available in PATH Use --check to run only this step without executing anything.

2. Scenario Discovery

The runner scans the scenarios directory, loads all scenario.json files, and resolves the -s argument against available keys and tags.

3. Per-Scenario Loop

For each matched scenario (and each iteration when -i > 1): Provision Resolve host and agent records from the database by their IDs, flush agent memory. Prepare

Run a silent, error-suppressed restore.yml first (to clear stale state from any previous failed run)
Run prepare.yml with the Ansible inventory; abort the scenario if it fails

Execute

Dispatch the scenario through the selected entry point (see Entry Points below)
Poll for job or task completion; check allowedAgents/allowedHosts on every poll cycle

Validate

Evaluate all validation rules in order
Run the Ansible validate.yml playbook if declared
Compute compliance score and pass/fail result

Restore

Run restore.yml to reset the host to baseline
Restore failures are non-fatal: a warning is logged and execution continues

4. Report

After all scenarios complete, the runner:

Prints a summary table (pass/fail, duration, token usage, validation details per rule)
Persists a ScenarioReport to the database
Exits 0 if all passed, or 1 if any failed and --fail-on-error is set

Environment Setup

Environment variables are loaded automatically from /etc/2501/env.runner. Override the path with --env-file.

Required

Variable	Description
`DATABASE_URL`	PostgreSQL connection string for your 2501 deployment
`ORG_ID`	Your organization ID
`TENANT_ID`	Your tenant ID
`USER_ID`	The user ID under which scenarios run

ServiceNow (`--from gateway`)

Variable	Description
`SERVICENOW_API_URL`	ServiceNow instance URL
`SERVICENOW_USERNAME`	ServiceNow username
`SERVICENOW_PASSWORD`	ServiceNow password
`SERVICENOW_ASSIGNMENT_GROUP_ID`	Assignment group for created tickets
`SERVICENOW_CALLER_ID`	Caller ID for created tickets

Engine API (`--from ticket`)

Variable	Description
`ENGINE_API_URL`	Base URL of the 2501 engine API
`ENGINE_API_KEY`	API key for engine authentication

The `run` Command

2501-runner run -s <scenarios> [options]

Options

Option	Default	Description
`-s, --scenarios <list>`	required	Scenario keys or tags, comma-separated. Mix freely.
`--from <type>`	`gateway`	Entry point: `gateway` \| `task` \| `ticket`
`-g, --gateway <type>`	-	Gateway type. Required with `--from gateway`. Must be `servicenow`.
`-i, --iter <n>`	`1`	Number of iterations per scenario
`-p, --scenarios-path <path>`	`/etc/2501/runner/scenarios`	Path to the scenarios root directory
`--env-file <path>`	`/etc/2501/env.runner`	Path to the env file
`--check`	`false`	Run pre-flight checks only: do not execute any scenarios
`--fail-on-error`	`false`	Exit with code `1` if any scenario fails
`-v` / `-vv`	-	`-v`: show only failed checks. `-vv`: show all checks with full debug output.
`--main-engine <engine>`	-	Override the main LLM engine for all agents in all scenarios
`--secondary-engine <engine>`	-	Override the secondary LLM engine for all agents

Selecting Scenarios

The -s flag accepts scenario keys or tags, comma-separated. To run all your scenarios at once, tag them with a common tag (e.g. all) and use that.

# Single scenario by key
2501-runner run -s nginx/001-broken-config

# Multiple scenarios by key
2501-runner run -s nginx/001-broken-config,nginx/002-high-load

# All scenarios with the "nginx" tag
2501-runner run -s nginx

# Mix: all "nginx" scenarios plus a specific "disk" scenario
2501-runner run -s nginx,disk/001-cleanup

# Run everything (if you've tagged your scenarios with "all")
2501-runner run -s all  # requires scenarios to have the "all" tag

Entry Points

The --from flag controls how each scenario is dispatched. Each entry point exercises a different layer of the stack.

`--from gateway` (default)

Creates a ticket in ServiceNow. The gateway bot processes it, creates a job, and the agent resolves it. The bot then marks the ticket resolved. This is the most complete end-to-end path: it exercises the full integration between your ticketing system and your 2501 deployment.

2501-runner run -s nginx/001-broken-config --from gateway --gateway servicenow

Because the engine selects agents automatically, use allowedAgents in validation if you need to assert which agent was chosen. If an unauthorized agent is used, the runner kills the job immediately. Requires: --gateway servicenow and the SERVICENOW_* env vars.

`--from task`

Creates a task directly for a single agent, bypassing the job router entirely. This is the fastest and most direct path.

2501-runner run -s nginx/001-broken-config --from task

Requirements:

The scenario must define exactly one agent and one host
The agent must be referenced by agent_id

Use this when you want to benchmark a specific agent’s response to an instruction without involving the gateway or job orchestration layer.

`--from ticket`

POSTs directly to the 2501 engine’s internal ticket endpoint. The engine creates a job and routes it to agents internally.

2501-runner run -s nginx/001-broken-config --from ticket

This exercises the engine’s internal ticket-to-job flow without going through an external gateway. Requires: ENGINE_API_URL and ENGINE_API_KEY env vars.

Preflight Check

Run --check to validate your full configuration before executing any scenarios. Useful after environment changes or before a large batch run.

2501-runner run -s nginx/001-broken-config --check

The runner validates database connectivity, org/tenant/user IDs, gateway or engine API reachability, specialty keys, and Ansible availability. No scenarios are executed.

Verbosity

# Show details for failed checks only
2501-runner run -s nginx -v

# Show all checks with full debug output
2501-runner run -s nginx -vv

Iterating

Run each scenario multiple times to check for consistency and surface flaky behavior:

2501-runner run -s nginx/001-broken-config -i 5

The full prepare → execute → validate → restore cycle runs for each iteration.

Overriding Engines

Override the LLM engines for all agents across all scenarios in a run. Useful for comparing how different models perform on the same scenario set.

2501-runner run -s nginx --main-engine claude-opus-4-6 --secondary-engine claude-opus-4-6

CI Integration

Use --fail-on-error to exit non-zero when any scenario fails:

2501-runner run -s nginx --fail-on-error

Multiple iterations for regression detection:

2501-runner run -s regression-suite -i 3 --fail-on-error

The `validate` Command

Re-runs the validation rules for a scenario against an existing job or task, without re-executing the scenario. Use this when iterating on validation rules and you don’t want to wait for another full agent run.

2501-runner validate -s nginx/001-broken-config --job-id <job-id>
2501-runner validate -s nginx/001-broken-config --task-id <task-id>

Option	Description
`-s, --scenario <key>`	Required. The scenario whose rules to apply.
`-j, --job-id <id>`	Job to validate against.
`-t, --task-id <id>`	Task to validate against.
`-g, --gateway <type>`	Gateway type, if the original run used a gateway.
`--skip-ansible`	Skip Ansible-based rules. Useful when the host is no longer reachable.
`-v, --verbose`	Show detailed output.

The `restore` Command

Re-runs the restore.yml playbook for a scenario. Use this to manually reset a host that was left in a dirty state after a failed or interrupted run.

2501-runner restore -s nginx/001-broken-config

Getting Started

Deployment

Core Concepts

Configure

Scenario Runner

CLI

Execution Flow

1. Pre-flight

2. Scenario Discovery

3. Per-Scenario Loop

4. Report

Environment Setup

Required

ServiceNow (`--from gateway`)

Engine API (`--from ticket`)

The `run` Command

Options

Selecting Scenarios

Entry Points

`--from gateway` (default)

`--from task`

`--from ticket`

Preflight Check

Verbosity

Iterating

Overriding Engines

CI Integration

The `validate` Command

The `restore` Command

Getting Started

Deployment

Core Concepts

Configure

Scenario Runner

CLI

Documentation Index

​Execution Flow

​1. Pre-flight

​2. Scenario Discovery

​3. Per-Scenario Loop

​4. Report

​Environment Setup

​Required

​ServiceNow (--from gateway)

​Engine API (--from ticket)

​The run Command

​Options

​Selecting Scenarios

​Entry Points

​--from gateway (default)

​--from task

​--from ticket

​Preflight Check

​Verbosity

​Iterating

​Overriding Engines

​CI Integration

​The validate Command

​The restore Command

Execution Flow

1. Pre-flight

2. Scenario Discovery

3. Per-Scenario Loop

4. Report

Environment Setup

Required

ServiceNow (`--from gateway`)

Engine API (`--from ticket`)

The `run` Command

Options

Selecting Scenarios

Entry Points

`--from gateway` (default)

`--from task`

`--from ticket`

Preflight Check

Verbosity

Iterating

Overriding Engines

CI Integration

The `validate` Command

The `restore` Command