> ## Documentation Index
> Fetch the complete documentation index at: https://docs.2501.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# runner

> Drive Benchmark scenarios from the CLI

`2501 runner` is the [Benchmark](/0.8/benchmark/overview) driver. It dispatches scenarios as if they were real tickets, scores the result against your validation rules, and writes a report to the database. Typically only used in **sandbox environments**.

## Subcommands

```bash theme={null}
2501 runner start <args>         # run scenarios
2501 runner validate <args>      # lint scenarios, or re-score an existing run
2501 runner flush <args>         # delete scenario run data
2501 runner chaos <args>         # resilience testing — kills the engine mid-task
2501 runner sandbox <args>       # VM lifecycle (lima / incus): prepare, restore, create, delete, purge-vms
```

## `start` (was `run`)

```bash theme={null}
2501 runner start -s nginx/001-broken-config
2501 runner start -s nginx                 # all scenarios tagged nginx
2501 runner start -s nginx,disk -i 5       # 5 iterations across two tags
2501 runner start -s nginx --gateway servicenow   # use a real ServiceNow instead of the runner gateway
2501 runner start -s nginx --mode lima --parallel --parallel-ram-cap 16
```

| Common flag                                          | Meaning                                 |
| ---------------------------------------------------- | --------------------------------------- |
| `-s, --scenarios <list>`                             | Scenario keys or tags                   |
| `-m, --mode <host\|incus\|lima>`                     | Pre-provisioned hosts vs ephemeral VMs  |
| `-g, --gateway <runner\|servicenow>`                 | Where to submit the ticket              |
| `-i, --iter <n>`                                     | Number of iterations per scenario       |
| `--main-engine`, `--secondary-engine`, `--specialty` | Per-run overrides                       |
| `--parallel`                                         | Concurrent runs (VM modes only)         |
| `--fail-on-error`                                    | Exit non-zero if any scenario fails     |
| `--log-file <path>`                                  | Mirror output (ANSI-stripped) to a file |

For the full flag and env-var reference, see [Benchmark → start](/0.8/benchmark/start).

## `validate`

```bash theme={null}
# Lint scenarios without running them
2501 runner validate --scenarios
2501 runner validate --scenarios nginx

# Re-score an existing run without re-executing the scenario
2501 runner validate --runs --job-id <job-id>
2501 runner validate --runs --benchmark-id <bench-id>
```

Use `validate --runs` while iterating on validation rules — much faster than re-running the agent.

## `flush`

```bash theme={null}
2501 runner flush --older-than 7d --preview
2501 runner flush --scenario nginx/001-broken-config
2501 runner flush --deprecated      # delete records for scenario keys no longer on disk
2501 runner flush --all             # nuclear; requires typing yes
```

Removes ScenarioReport rows plus their associated Benchmark / Job / Task / Ticket records.

## `chaos`

Drives resilience testing: runs a scenario but kills the engine at random points during execution and verifies the system recovers. Used in CI to catch regressions in restart / resume behavior.

## `sandbox`

VM management for `--mode lima` or `--mode incus`:

```bash theme={null}
2501 runner sandbox prepare -s nginx/001-broken-config -m lima
# … SSH in, inspect, iterate …
2501 runner sandbox restore -s nginx/001-broken-config -m lima

2501 runner sandbox create --template debian-docker -m lima -n my-target
2501 runner sandbox delete -s my-target -m lima

2501 runner sandbox purge-vms -m lima  # clean up stale clones
```

See [Benchmark → VM Sandbox](/0.8/benchmark/sandbox) for the full sandbox surface.