Execution Flow
Understanding what the runner does on each invocation helps when writing scenarios and debugging failures. Every2501-runner run call goes through these phases:
1. Pre-flight
Before any scenario executes, the runner validates the full environment:- Connects to the database and verifies
ORG_ID,TENANT_ID,USER_IDexist - If
--from gateway: tests ServiceNow API connectivity - If
--from ticket: tests engine API connectivity - Verifies
ansible-playbookis available in PATH Use--checkto run only this step without executing anything.
2. Scenario Discovery
The runner scans the scenarios directory, loads allscenario.json files, and resolves the -s argument against available keys and tags.
3. Per-Scenario Loop
For each matched scenario (and each iteration when-i > 1):
Provision
Resolve host and agent records from the database by their IDs, flush agent memory.
Prepare
- Run a silent, error-suppressed
restore.ymlfirst (to clear stale state from any previous failed run) - Run
prepare.ymlwith the Ansible inventory; abort the scenario if it fails
- Dispatch the scenario through the selected entry point (see Entry Points below)
- Poll for job or task completion; check
allowedAgents/allowedHostson every poll cycle
- Evaluate all validation rules in order
- Run the Ansible
validate.ymlplaybook if declared - Compute compliance score and pass/fail result
- Run
restore.ymlto reset the host to baseline - Restore failures are non-fatal: a warning is logged and execution continues
4. Report
After all scenarios complete, the runner:- Prints a summary table (pass/fail, duration, token usage, validation details per rule)
- Persists a
ScenarioReportto the database - Exits
0if all passed, or1if any failed and--fail-on-erroris set
Environment Setup
Environment variables are loaded automatically from/etc/2501/env.runner. Override the path with --env-file.
Required
| Variable | Description |
|---|---|
DATABASE_URL | PostgreSQL connection string for your 2501 deployment |
ORG_ID | Your organization ID |
TENANT_ID | Your tenant ID |
USER_ID | The user ID under which scenarios run |
ServiceNow (--from gateway)
| Variable | Description |
|---|---|
SERVICENOW_API_URL | ServiceNow instance URL |
SERVICENOW_USERNAME | ServiceNow username |
SERVICENOW_PASSWORD | ServiceNow password |
SERVICENOW_ASSIGNMENT_GROUP_ID | Assignment group for created tickets |
SERVICENOW_CALLER_ID | Caller ID for created tickets |
Engine API (--from ticket)
| Variable | Description |
|---|---|
ENGINE_API_URL | Base URL of the 2501 engine API |
ENGINE_API_KEY | API key for engine authentication |
The run Command
Options
| Option | Default | Description |
|---|---|---|
-s, --scenarios <list> | required | Scenario keys or tags, comma-separated. Mix freely. |
--from <type> | gateway | Entry point: gateway | task | ticket |
-g, --gateway <type> | - | Gateway type. Required with --from gateway. Must be servicenow. |
-i, --iter <n> | 1 | Number of iterations per scenario |
-p, --scenarios-path <path> | /etc/2501/runner/scenarios | Path to the scenarios root directory |
--env-file <path> | /etc/2501/env.runner | Path to the env file |
--check | false | Run pre-flight checks only: do not execute any scenarios |
--fail-on-error | false | Exit with code 1 if any scenario fails |
-v / -vv | - | -v: show only failed checks. -vv: show all checks with full debug output. |
--main-engine <engine> | - | Override the main LLM engine for all agents in all scenarios |
--secondary-engine <engine> | - | Override the secondary LLM engine for all agents |
Selecting Scenarios
The-s flag accepts scenario keys or tags, comma-separated. To run all your scenarios at once, tag them with a common tag (e.g. all) and use that.
Entry Points
The--from flag controls how each scenario is dispatched. Each entry point exercises a different layer of the stack.
--from gateway (default)
Creates a ticket in ServiceNow. The gateway bot processes it, creates a job, and the agent resolves it. The bot then marks the ticket resolved.
This is the most complete end-to-end path: it exercises the full integration between your ticketing system and your 2501 deployment.
allowedAgents in validation if you need to assert which agent was chosen. If an unauthorized agent is used, the runner kills the job immediately.
Requires: --gateway servicenow and the SERVICENOW_* env vars.
--from task
Creates a task directly for a single agent, bypassing the job router entirely. This is the fastest and most direct path.
- The scenario must define exactly one agent and one host
- The agent must be referenced by
agent_id
--from ticket
POSTs directly to the 2501 engine’s internal ticket endpoint. The engine creates a job and routes it to agents internally.
ENGINE_API_URL and ENGINE_API_KEY env vars.
Preflight Check
Run--check to validate your full configuration before executing any scenarios. Useful after environment changes or before a large batch run.
Verbosity
Iterating
Run each scenario multiple times to check for consistency and surface flaky behavior:Overriding Engines
Override the LLM engines for all agents across all scenarios in a run. Useful for comparing how different models perform on the same scenario set.CI Integration
Use--fail-on-error to exit non-zero when any scenario fails:
The validate Command
Re-runs the validation rules for a scenario against an existing job or task, without re-executing the scenario. Use this when iterating on validation rules and you don’t want to wait for another full agent run.
| Option | Description |
|---|---|
-s, --scenario <key> | Required. The scenario whose rules to apply. |
-j, --job-id <id> | Job to validate against. |
-t, --task-id <id> | Task to validate against. |
-g, --gateway <type> | Gateway type, if the original run used a gateway. |
--skip-ansible | Skip Ansible-based rules. Useful when the host is no longer reachable. |
-v, --verbose | Show detailed output. |
The restore Command
Re-runs the restore.yml playbook for a scenario. Use this to manually reset a host that was left in a dirty state after a failed or interrupted run.

