Skip to main content

Execution Flow

Understanding what the runner does on each invocation helps when writing scenarios and debugging failures. Every 2501-runner run call goes through these phases:

1. Pre-flight

Before any scenario executes, the runner validates the full environment:
  • Connects to the database and verifies ORG_ID, TENANT_ID, USER_ID exist
  • If --from gateway: tests ServiceNow API connectivity
  • If --from ticket: tests engine API connectivity
  • Verifies ansible-playbook is available in PATH Use --check to run only this step without executing anything.

2. Scenario Discovery

The runner scans the scenarios directory, loads all scenario.json files, and resolves the -s argument against available keys and tags.

3. Per-Scenario Loop

For each matched scenario (and each iteration when -i > 1): Provision Resolve host and agent records from the database by their IDs, flush agent memory. Prepare
  • Run a silent, error-suppressed restore.yml first (to clear stale state from any previous failed run)
  • Run prepare.yml with the Ansible inventory; abort the scenario if it fails
Execute
  • Dispatch the scenario through the selected entry point (see Entry Points below)
  • Poll for job or task completion; check allowedAgents/allowedHosts on every poll cycle
Validate
  • Evaluate all validation rules in order
  • Run the Ansible validate.yml playbook if declared
  • Compute compliance score and pass/fail result
Restore
  • Run restore.yml to reset the host to baseline
  • Restore failures are non-fatal: a warning is logged and execution continues

4. Report

After all scenarios complete, the runner:
  • Prints a summary table (pass/fail, duration, token usage, validation details per rule)
  • Persists a ScenarioReport to the database
  • Exits 0 if all passed, or 1 if any failed and --fail-on-error is set

Environment Setup

Environment variables are loaded automatically from /etc/2501/env.runner. Override the path with --env-file.

Required

VariableDescription
DATABASE_URLPostgreSQL connection string for your 2501 deployment
ORG_IDYour organization ID
TENANT_IDYour tenant ID
USER_IDThe user ID under which scenarios run

ServiceNow (--from gateway)

VariableDescription
SERVICENOW_API_URLServiceNow instance URL
SERVICENOW_USERNAMEServiceNow username
SERVICENOW_PASSWORDServiceNow password
SERVICENOW_ASSIGNMENT_GROUP_IDAssignment group for created tickets
SERVICENOW_CALLER_IDCaller ID for created tickets

Engine API (--from ticket)

VariableDescription
ENGINE_API_URLBase URL of the 2501 engine API
ENGINE_API_KEYAPI key for engine authentication

The run Command

2501-runner run -s <scenarios> [options]

Options

OptionDefaultDescription
-s, --scenarios <list>requiredScenario keys or tags, comma-separated. Mix freely.
--from <type>gatewayEntry point: gateway | task | ticket
-g, --gateway <type>-Gateway type. Required with --from gateway. Must be servicenow.
-i, --iter <n>1Number of iterations per scenario
-p, --scenarios-path <path>/etc/2501/runner/scenariosPath to the scenarios root directory
--env-file <path>/etc/2501/env.runnerPath to the env file
--checkfalseRun pre-flight checks only: do not execute any scenarios
--fail-on-errorfalseExit with code 1 if any scenario fails
-v / -vv--v: show only failed checks. -vv: show all checks with full debug output.
--main-engine <engine>-Override the main LLM engine for all agents in all scenarios
--secondary-engine <engine>-Override the secondary LLM engine for all agents

Selecting Scenarios

The -s flag accepts scenario keys or tags, comma-separated. To run all your scenarios at once, tag them with a common tag (e.g. all) and use that.
# Single scenario by key
2501-runner run -s nginx/001-broken-config

# Multiple scenarios by key
2501-runner run -s nginx/001-broken-config,nginx/002-high-load

# All scenarios with the "nginx" tag
2501-runner run -s nginx

# Mix: all "nginx" scenarios plus a specific "disk" scenario
2501-runner run -s nginx,disk/001-cleanup

# Run everything (if you've tagged your scenarios with "all")
2501-runner run -s all  # requires scenarios to have the "all" tag

Entry Points

The --from flag controls how each scenario is dispatched. Each entry point exercises a different layer of the stack.

--from gateway (default)

Creates a ticket in ServiceNow. The gateway bot processes it, creates a job, and the agent resolves it. The bot then marks the ticket resolved. This is the most complete end-to-end path: it exercises the full integration between your ticketing system and your 2501 deployment.
2501-runner run -s nginx/001-broken-config --from gateway --gateway servicenow
Because the engine selects agents automatically, use allowedAgents in validation if you need to assert which agent was chosen. If an unauthorized agent is used, the runner kills the job immediately. Requires: --gateway servicenow and the SERVICENOW_* env vars.

--from task

Creates a task directly for a single agent, bypassing the job router entirely. This is the fastest and most direct path.
2501-runner run -s nginx/001-broken-config --from task
Requirements:
  • The scenario must define exactly one agent and one host
  • The agent must be referenced by agent_id
Use this when you want to benchmark a specific agent’s response to an instruction without involving the gateway or job orchestration layer.

--from ticket

POSTs directly to the 2501 engine’s internal ticket endpoint. The engine creates a job and routes it to agents internally.
2501-runner run -s nginx/001-broken-config --from ticket
This exercises the engine’s internal ticket-to-job flow without going through an external gateway. Requires: ENGINE_API_URL and ENGINE_API_KEY env vars.

Preflight Check

Run --check to validate your full configuration before executing any scenarios. Useful after environment changes or before a large batch run.
2501-runner run -s nginx/001-broken-config --check
The runner validates database connectivity, org/tenant/user IDs, gateway or engine API reachability, specialty keys, and Ansible availability. No scenarios are executed.

Verbosity

# Show details for failed checks only
2501-runner run -s nginx -v

# Show all checks with full debug output
2501-runner run -s nginx -vv

Iterating

Run each scenario multiple times to check for consistency and surface flaky behavior:
2501-runner run -s nginx/001-broken-config -i 5
The full prepare → execute → validate → restore cycle runs for each iteration.

Overriding Engines

Override the LLM engines for all agents across all scenarios in a run. Useful for comparing how different models perform on the same scenario set.
2501-runner run -s nginx --main-engine claude-opus-4-6 --secondary-engine claude-opus-4-6

CI Integration

Use --fail-on-error to exit non-zero when any scenario fails:
2501-runner run -s nginx --fail-on-error
Multiple iterations for regression detection:
2501-runner run -s regression-suite -i 3 --fail-on-error

The validate Command

Re-runs the validation rules for a scenario against an existing job or task, without re-executing the scenario. Use this when iterating on validation rules and you don’t want to wait for another full agent run.
2501-runner validate -s nginx/001-broken-config --job-id <job-id>
2501-runner validate -s nginx/001-broken-config --task-id <task-id>
OptionDescription
-s, --scenario <key>Required. The scenario whose rules to apply.
-j, --job-id <id>Job to validate against.
-t, --task-id <id>Task to validate against.
-g, --gateway <type>Gateway type, if the original run used a gateway.
--skip-ansibleSkip Ansible-based rules. Useful when the host is no longer reachable.
-v, --verboseShow detailed output.

The restore Command

Re-runs the restore.yml playbook for a scenario. Use this to manually reset a host that was left in a dirty state after a failed or interrupted run.
2501-runner restore -s nginx/001-broken-config