runner

2501 runner is the Benchmark driver. It dispatches scenarios as if they were real tickets, scores the result against your validation rules, and writes a report to the database. Typically only used in sandbox environments.

Subcommands

runner start <args>         # run scenarios
runner validate <args>      # lint scenarios, or re-score an existing run
runner flush <args>         # delete scenario run data
runner chaos <args>         # resilience testing — kills the engine mid-task
runner sandbox <args>       # VM lifecycle (lima / incus): prepare, restore, create, delete, purge-vms

`start` (was `run`)

runner start -s nginx/001-broken-config
runner start -s nginx                 # all scenarios tagged nginx
runner start -s nginx,disk -i 5       # 5 iterations across two tags
runner start -s nginx --gateway servicenow   # use a real ServiceNow instead of the runner gateway
runner start -s nginx --mode lima --parallel --parallel-ram-cap 16

Common flag	Meaning
`-s, --scenarios <list>`	Scenario keys or tags
`-m, --mode <host\|incus\|lima>`	Pre-provisioned hosts vs ephemeral VMs
`-g, --gateway <runner\|servicenow>`	Where to submit the ticket
`-i, --iter <n>`	Number of iterations per scenario
`--main-engine`, `--secondary-engine`, `--specialty`	Per-run overrides
`--parallel`	Concurrent runs (VM modes only)
`--fail-on-error`	Exit non-zero if any scenario fails
`--log-file <path>`	Mirror output (ANSI-stripped) to a file

For the full flag and env-var reference, see Benchmark → start.

`validate`

# Lint scenarios without running them
2501 runner validate --scenarios
2501 runner validate --scenarios nginx

# Re-score an existing run without re-executing the scenario
2501 runner validate --runs --job-id <job-id>
2501 runner validate --runs --benchmark-id <bench-id>

Use validate --runs while iterating on validation rules — much faster than re-running the agent.

`flush`

runner flush --older-than 7d --preview
runner flush --scenario nginx/001-broken-config
runner flush --deprecated      # delete records for scenario keys no longer on disk
runner flush --all             # nuclear; requires typing yes

Removes ScenarioReport rows plus their associated Benchmark / Job / Task / Ticket records.

`chaos`

Drives resilience testing: runs a scenario but kills the engine at random points during execution and verifies the system recovers. Used in CI to catch regressions in restart / resume behavior.

`sandbox`

VM management for --mode lima or --mode incus:

runner sandbox prepare -s nginx/001-broken-config -m lima
# … SSH in, inspect, iterate …
runner sandbox restore -s nginx/001-broken-config -m lima

runner sandbox create --template debian-docker -m lima -n my-target
runner sandbox delete -s my-target -m lima

runner sandbox purge-vms -m lima  # clean up stale clones

See Benchmark → VM Sandbox for the full sandbox surface.

CLI

Deploy

Configure

Subcommands

`start` (was `run`)

`validate`

`flush`

`chaos`

`sandbox`

​Subcommands

​start (was run)

​validate

​flush

​chaos

​sandbox

Subcommands

`start` (was `run`)

`validate`

`flush`

`chaos`

`sandbox`