Documentation Index
Fetch the complete documentation index at: https://docs.2501.ai/llms.txt
Use this file to discover all available pages before exploring further.
Execution Flow
Understanding what the runner does on each invocation helps when writing scenarios and debugging failures. Every 2501-runner run call goes through these phases:
1. Pre-flight
Before any scenario executes, the runner validates the full environment:
- Connects to the database and verifies
ORG_ID, TENANT_ID, USER_ID exist
- If
--from gateway: tests ServiceNow API connectivity
- If
--from ticket: tests engine API connectivity
- Verifies
ansible-playbook is available in PATH
Use --check to run only this step without executing anything.
2. Scenario Discovery
The runner scans the scenarios directory, loads all scenario.json files, and resolves the -s argument against available keys and tags.
3. Per-Scenario Loop
For each matched scenario (and each iteration when -i > 1):
Provision
Resolve host and agent records from the database by their IDs, flush agent memory.
Prepare
- Run a silent, error-suppressed
restore.yml first (to clear stale state from any previous failed run)
- Run
prepare.yml with the Ansible inventory; abort the scenario if it fails
Execute
- Dispatch the scenario through the selected entry point (see Entry Points below)
- Poll for job or task completion; check
allowedAgents/allowedHosts on every poll cycle
Validate
- Evaluate all validation rules in order
- Run the Ansible
validate.yml playbook if declared
- Compute compliance score and pass/fail result
Restore
- Run
restore.yml to reset the host to baseline
- Restore failures are non-fatal: a warning is logged and execution continues
4. Report
After all scenarios complete, the runner:
- Prints a summary table (pass/fail, duration, token usage, validation details per rule)
- Persists a
ScenarioReport to the database
- Exits
0 if all passed, or 1 if any failed and --fail-on-error is set
Environment Setup
Environment variables are loaded automatically from /etc/2501/env.runner. Override the path with --env-file.
Required
| Variable | Description |
|---|
DATABASE_URL | PostgreSQL connection string for your 2501 deployment |
ORG_ID | Your organization ID |
TENANT_ID | Your tenant ID |
USER_ID | The user ID under which scenarios run |
ServiceNow (--from gateway)
| Variable | Description |
|---|
SERVICENOW_API_URL | ServiceNow instance URL |
SERVICENOW_USERNAME | ServiceNow username |
SERVICENOW_PASSWORD | ServiceNow password |
SERVICENOW_ASSIGNMENT_GROUP_ID | Assignment group for created tickets |
SERVICENOW_CALLER_ID | Caller ID for created tickets |
Engine API (--from ticket)
| Variable | Description |
|---|
ENGINE_API_URL | Base URL of the 2501 engine API |
ENGINE_API_KEY | API key for engine authentication |
The run Command
2501-runner run -s <scenarios> [options]
Options
| Option | Default | Description |
|---|
-s, --scenarios <list> | required | Scenario keys or tags, comma-separated. Mix freely. |
--from <type> | gateway | Entry point: gateway | task | ticket |
-g, --gateway <type> | - | Gateway type. Required with --from gateway. Must be servicenow. |
-i, --iter <n> | 1 | Number of iterations per scenario |
-p, --scenarios-path <path> | /etc/2501/runner/scenarios | Path to the scenarios root directory |
--env-file <path> | /etc/2501/env.runner | Path to the env file |
--check | false | Run pre-flight checks only: do not execute any scenarios |
--fail-on-error | false | Exit with code 1 if any scenario fails |
-v / -vv | - | -v: show only failed checks. -vv: show all checks with full debug output. |
--main-engine <engine> | - | Override the main LLM engine for all agents in all scenarios |
--secondary-engine <engine> | - | Override the secondary LLM engine for all agents |
Selecting Scenarios
The -s flag accepts scenario keys or tags, comma-separated. To run all your scenarios at once, tag them with a common tag (e.g. all) and use that.
# Single scenario by key
2501-runner run -s nginx/001-broken-config
# Multiple scenarios by key
2501-runner run -s nginx/001-broken-config,nginx/002-high-load
# All scenarios with the "nginx" tag
2501-runner run -s nginx
# Mix: all "nginx" scenarios plus a specific "disk" scenario
2501-runner run -s nginx,disk/001-cleanup
# Run everything (if you've tagged your scenarios with "all")
2501-runner run -s all # requires scenarios to have the "all" tag
Entry Points
The --from flag controls how each scenario is dispatched. Each entry point exercises a different layer of the stack.
--from gateway (default)
Creates a ticket in ServiceNow. The gateway bot processes it, creates a job, and the agent resolves it. The bot then marks the ticket resolved.
This is the most complete end-to-end path: it exercises the full integration between your ticketing system and your 2501 deployment.
2501-runner run -s nginx/001-broken-config --from gateway --gateway servicenow
Because the engine selects agents automatically, use allowedAgents in validation if you need to assert which agent was chosen. If an unauthorized agent is used, the runner kills the job immediately.
Requires: --gateway servicenow and the SERVICENOW_* env vars.
--from task
Creates a task directly for a single agent, bypassing the job router entirely. This is the fastest and most direct path.
2501-runner run -s nginx/001-broken-config --from task
Requirements:
- The scenario must define exactly one agent and one host
- The agent must be referenced by
agent_id
Use this when you want to benchmark a specific agent’s response to an instruction without involving the gateway or job orchestration layer.
--from ticket
POSTs directly to the 2501 engine’s internal ticket endpoint. The engine creates a job and routes it to agents internally.
2501-runner run -s nginx/001-broken-config --from ticket
This exercises the engine’s internal ticket-to-job flow without going through an external gateway.
Requires: ENGINE_API_URL and ENGINE_API_KEY env vars.
Preflight Check
Run --check to validate your full configuration before executing any scenarios. Useful after environment changes or before a large batch run.
2501-runner run -s nginx/001-broken-config --check
The runner validates database connectivity, org/tenant/user IDs, gateway or engine API reachability, specialty keys, and Ansible availability. No scenarios are executed.
Verbosity
# Show details for failed checks only
2501-runner run -s nginx -v
# Show all checks with full debug output
2501-runner run -s nginx -vv
Iterating
Run each scenario multiple times to check for consistency and surface flaky behavior:
2501-runner run -s nginx/001-broken-config -i 5
The full prepare → execute → validate → restore cycle runs for each iteration.
Overriding Engines
Override the LLM engines for all agents across all scenarios in a run. Useful for comparing how different models perform on the same scenario set.
2501-runner run -s nginx --main-engine claude-opus-4-6 --secondary-engine claude-opus-4-6
CI Integration
Use --fail-on-error to exit non-zero when any scenario fails:
2501-runner run -s nginx --fail-on-error
Multiple iterations for regression detection:
2501-runner run -s regression-suite -i 3 --fail-on-error
The validate Command
Re-runs the validation rules for a scenario against an existing job or task, without re-executing the scenario. Use this when iterating on validation rules and you don’t want to wait for another full agent run.
2501-runner validate -s nginx/001-broken-config --job-id <job-id>
2501-runner validate -s nginx/001-broken-config --task-id <task-id>
| Option | Description |
|---|
-s, --scenario <key> | Required. The scenario whose rules to apply. |
-j, --job-id <id> | Job to validate against. |
-t, --task-id <id> | Task to validate against. |
-g, --gateway <type> | Gateway type, if the original run used a gateway. |
--skip-ansible | Skip Ansible-based rules. Useful when the host is no longer reachable. |
-v, --verbose | Show detailed output. |
The restore Command
Re-runs the restore.yml playbook for a scenario. Use this to manually reset a host that was left in a dirty state after a failed or interrupted run.
2501-runner restore -s nginx/001-broken-config