Skip to main content
2501 runner is the Benchmark driver. It dispatches scenarios as if they were real tickets, scores the result against your validation rules, and writes a report to the database. Typically only used in sandbox environments.

Subcommands

2501 runner start <args>         # run scenarios
2501 runner validate <args>      # lint scenarios, or re-score an existing run
2501 runner flush <args>         # delete scenario run data
2501 runner chaos <args>         # resilience testing — kills the engine mid-task
2501 runner sandbox <args>       # VM lifecycle (lima / incus): prepare, restore, create, delete, purge-vms

start (was run)

2501 runner start -s nginx/001-broken-config
2501 runner start -s nginx                 # all scenarios tagged nginx
2501 runner start -s nginx,disk -i 5       # 5 iterations across two tags
2501 runner start -s nginx --gateway servicenow   # use a real ServiceNow instead of the runner gateway
2501 runner start -s nginx --mode lima --parallel --parallel-ram-cap 16
Common flagMeaning
-s, --scenarios <list>Scenario keys or tags
-m, --mode <host|incus|lima>Pre-provisioned hosts vs ephemeral VMs
-g, --gateway <runner|servicenow>Where to submit the ticket
-i, --iter <n>Number of iterations per scenario
--main-engine, --secondary-engine, --specialtyPer-run overrides
--parallelConcurrent runs (VM modes only)
--fail-on-errorExit non-zero if any scenario fails
--log-file <path>Mirror output (ANSI-stripped) to a file
For the full flag and env-var reference, see Benchmark → start.

validate

# Lint scenarios without running them
2501 runner validate --scenarios
2501 runner validate --scenarios nginx

# Re-score an existing run without re-executing the scenario
2501 runner validate --runs --job-id <job-id>
2501 runner validate --runs --benchmark-id <bench-id>
Use validate --runs while iterating on validation rules — much faster than re-running the agent.

flush

2501 runner flush --older-than 7d --preview
2501 runner flush --scenario nginx/001-broken-config
2501 runner flush --deprecated      # delete records for scenario keys no longer on disk
2501 runner flush --all             # nuclear; requires typing yes
Removes ScenarioReport rows plus their associated Benchmark / Job / Task / Ticket records.

chaos

Drives resilience testing: runs a scenario but kills the engine at random points during execution and verifies the system recovers. Used in CI to catch regressions in restart / resume behavior.

sandbox

VM management for --mode lima or --mode incus:
2501 runner sandbox prepare -s nginx/001-broken-config -m lima
# … SSH in, inspect, iterate …
2501 runner sandbox restore -s nginx/001-broken-config -m lima

2501 runner sandbox create --template debian-docker -m lima -n my-target
2501 runner sandbox delete -s my-target -m lima

2501 runner sandbox purge-vms -m lima  # clean up stale clones
See Benchmark → VM Sandbox for the full sandbox surface.