2501-runner) is a benchmarking tool that lets you define infrastructure scenarios, run them against 2501 agents in a controlled sandbox, and measure how well those agents perform.
Each scenario simulates a realistic IT problem: a broken service, a misconfigured daemon, a disk filling up. The runner sets up the problem, dispatches the task to an agent, waits for it to act, and scores the result against validation rules you define.
What Gets Validated
Every scenario run produces two independent scores. Compliance answers: did the agent follow the right process? These are behavioral checks: did the agent test the nginx config before restarting the service, did it check disk usage before deleting files, did it avoid destructive commands? Compliance rules let you encode your operational standards and verify the agent respects them. Task validation answers: did the agent actually fix the problem? This is ground-truth verification against the actual state of the target host after the agent finishes: is the service running, is the port responding, is the file in the right place? A scenario passes only when both scores pass. This separation is intentional: an agent can fix the problem in a way that violates your processes, or follow all the right steps but leave the system broken.Prerequisites
Before using the runner, make sure the following are in place on the machine where it’s installed:- A running 2501 instance reachable from the runner machine. The runner connects to the database directly (
DATABASE_URL) and optionally to a gateway (ServiceNow) or the engine API. - Ansible installed and available in PATH. The runner uses
ansible-playbookto run scenario playbooks against target hosts. - SSH access from the runner machine to your sandbox hosts.
- Scenario files in the scenarios directory (default:
/etc/2501/runner/scenarios). See Scenario Structure or Examples to get started.
Quick Start
Next Steps
- Scenario Structure: directory layout and the
scenario.jsonformat - Hosts & Agents: referencing your sandbox hosts and agents
- Validation: rules, validators, and the scoring model
- Playbooks: Ansible playbooks and inventory
- Examples: complete worked examples (disk full, nginx, Kubernetes)
- Run Scenarios: CLI reference, entry points, and environment setup

