Examples - 2501.ai

These examples are complete, ready-to-use scenarios. Each includes the full scenario.json, inventory.ini, and all three playbooks. Copy them into your scenarios directory and adjust the host_id, agent_id, and IP addresses to match your sandbox.

Disk Full

A scenario where a disk is filling up. The agent must identify what is consuming space and free enough of it: without deleting application data. Compliance checks: Did the agent inspect disk usage? Did it identify the largest consumers before acting? Task validation: Is there enough free space after the agent finishes?

scenario.json
inventory.ini
prepare.yml
validate.yml
restore.yml

{
  "key": "disk/001-disk-full",
  "name": "Disk Full",
  "description": "@2501 the /var partition on sandbox-app-01 is at 95% capacity and the application is starting to throw write errors. Identify what is consuming the most space and free up at least 5GB. Do not delete anything under /var/www or /var/lib/postgresql.",
  "tags": ["disk", "storage"],
  "hosts": [
    { "host_id": "hst_abc123" }
  ],
  "agents": [
    {
      "agent_id": "agt_xyz789",
      "host_id": "hst_abc123"
    }
  ],
  "validation": {
    "allowedAgents": ["agt_xyz789"],
    "job": [
      {
        "label": "Job resolved successfully",
        "validator": "job_resolution_status",
        "pattern": "success"
      }
    ],
    "tasks": [
      {
        "label": "Agent checked disk usage",
        "validator": "pattern_match",
        "pattern": "df\\s|du\\s|ncdu",
        "where": "executed_commands"
      },
      {
        "label": "Agent identified large files or directories before deleting",
        "validator": "pattern_match",
        "pattern": "du\\s+-sh|du\\s+-h|find.*-size|ls\\s+-lh|ncdu",
        "where": "executed_commands"
      },
      {
        "label": "Application data was not touched",
        "validator": "pattern_match",
        "pattern": "rm.*/var/www|rm.*/var/lib/postgresql",
        "where": "executed_commands",
        "negate": true
      },
      {
        "label": "At least 5GB freed",
        "validator": "ansible",
        "ansiblePath": "validate.yml"
      }
    ]
  }
}

[app]
sandbox-app-01 ansible_host=10.0.1.10 ansible_user=ubuntu ansible_ssh_private_key_file=/etc/2501/keys/sandbox.pem

---
- name: Fill up /var with large dummy files
  hosts: app
  become: true
  tasks:
    - name: Create a large log directory with old rotated logs
      file:
        path: /var/log/myapp
        state: directory

    - name: Generate 6GB of fake rotated logs
      shell: |
        for i in $(seq 1 60); do
          dd if=/dev/urandom of=/var/log/myapp/app.log.$i bs=1M count=100 2>/dev/null
        done
      args:
        creates: /var/log/myapp/app.log.60

    - name: Verify partition is above 90%
      shell: df /var | awk 'NR==2 {print $5}' | tr -d '%'
      register: usage
      failed_when: usage.stdout | int < 90

---
- name: Verify at least 5GB is free on /var
  hosts: app
  become: true
  tasks:
    - name: Get free space on /var in GB
      shell: df /var --output=avail -BG | tail -1 | tr -d 'G '
      register: free_gb

    - name: Assert at least 5GB free
      assert:
        that: free_gb.stdout | int >= 5
        fail_msg: "Only {{ free_gb.stdout }}GB free on /var, expected at least 5GB"

---
- name: Remove dummy log files
  hosts: app
  become: true
  tasks:
    - name: Delete generated log files
      file:
        path: /var/log/myapp
        state: absent
      ignore_errors: true

Nginx Broken Configuration

A broken nginx configuration prevents the web server from starting. The agent must diagnose the issue, fix the configuration file, and restore the service. Compliance checks: Did the agent run nginx -t before restarting? Did it actually edit the config file? Task validation: Is nginx running and serving traffic on port 80?

scenario.json
inventory.ini
prepare.yml
validate.yml
restore.yml

{
  "key": "nginx/001-broken-config",
  "name": "Nginx Broken Configuration",
  "description": "@2501 the nginx service on sandbox-web-01 is not running. It was working yesterday but stopped after a configuration change. Investigate the issue, fix the configuration, and ensure nginx is running and serving traffic on port 80.",
  "tags": ["nginx", "web", "config"],
  "hosts": [
    { "host_id": "hst_abc123" }
  ],
  "agents": [
    {
      "agent_id": "agt_xyz789",
      "host_id": "hst_abc123"
    }
  ],
  "validation": {
    "allowedAgents": ["agt_xyz789"],
    "job": [
      {
        "label": "Job resolved successfully",
        "validator": "job_resolution_status",
        "pattern": "success"
      },
      {
        "label": "Resolved in a reasonable number of tasks",
        "validator": "task_count",
        "min": 1,
        "max": 4
      }
    ],
    "tasks": [
      {
        "label": "Agent inspected the nginx configuration",
        "validator": "pattern_match",
        "pattern": "/etc/nginx/",
        "where": "executed_commands"
      },
      {
        "label": "Agent tested the config before restarting",
        "validator": "pattern_match",
        "pattern": "nginx -t",
        "where": "executed_commands"
      },
      {
        "label": "Agent restarted nginx",
        "validator": "pattern_match",
        "pattern": "systemctl.*(restart|reload|start).*nginx",
        "where": "executed_commands"
      },
      {
        "label": "Nginx is running and serving traffic",
        "validator": "ansible",
        "ansiblePath": "validate.yml"
      },
      {
        "label": "Agent described the root cause (informational)",
        "validator": "pattern_match",
        "pattern": "syntax|semicolon|bracket|config",
        "where": "task_summary",
        "required": false
      }
    ]
  }
}

[web]
sandbox-web-01 ansible_host=10.0.1.10 ansible_user=ubuntu ansible_ssh_private_key_file=/etc/2501/keys/sandbox.pem

---
- name: Introduce broken nginx configuration
  hosts: web
  become: true
  tasks:
    - name: Ensure nginx is installed
      apt:
        name: nginx
        state: present
        update_cache: true

    - name: Write config with syntax error (missing semicolon)
      copy:
        dest: /etc/nginx/sites-available/default
        content: |
          server {
            listen 80
            root /var/www/html;
            index index.html;
          }

    - name: Attempt to reload nginx (will fail: intentional)
      systemd:
        name: nginx
        state: restarted
      ignore_errors: true

---
- name: Verify nginx is healthy
  hosts: web
  become: true
  tasks:
    - name: Config syntax is valid
      command: nginx -t

    - name: Service is active
      command: systemctl is-active nginx

    - name: Port 80 is responding
      uri:
        url: http://localhost:80
        status_code: [200, 301, 302]

---
- name: Reset nginx to clean state
  hosts: web
  become: true
  tasks:
    - name: Stop nginx
      systemd:
        name: nginx
        state: stopped
        enabled: false
      ignore_errors: true

    - name: Remove broken config
      file:
        path: /etc/nginx/sites-available/default
        state: absent
      ignore_errors: true

Kubernetes CrashLooping Pod

A deployment in the cluster has a pod stuck in CrashLoopBackOff due to a bad environment variable. The agent must investigate the pod logs, identify the misconfiguration, patch the deployment, and verify the pod comes healthy. Compliance checks: Did the agent check pod logs and describe the pod before making changes? Did it use kubectl to inspect before patching? Task validation: Is the pod running and ready after the agent’s fix?

scenario.json
inventory.ini
prepare.yml
validate.yml
restore.yml

{
  "key": "kubernetes/001-crashloop-pod",
  "name": "Kubernetes CrashLooping Pod",
  "description": "@2501 the 'api-server' deployment in the 'production' namespace has a pod stuck in CrashLoopBackOff. Investigate the issue using pod logs and events, identify the root cause, fix the deployment configuration, and ensure the pod comes up healthy.",
  "tags": ["kubernetes", "k8s", "crashloop"],
  "hosts": [
    { "host_id": "hst_abc123" }
  ],
  "agents": [
    {
      "agent_id": "agt_xyz789",
      "host_id": "hst_abc123"
    }
  ],
  "validation": {
    "allowedAgents": ["agt_xyz789"],
    "job": [
      {
        "label": "Job resolved successfully",
        "validator": "job_resolution_status",
        "pattern": "success"
      }
    ],
    "tasks": [
      {
        "label": "Agent inspected pod logs",
        "validator": "pattern_match",
        "pattern": "kubectl.*logs",
        "where": "executed_commands"
      },
      {
        "label": "Agent described the pod or deployment",
        "validator": "pattern_match",
        "pattern": "kubectl.*describe",
        "where": "executed_commands"
      },
      {
        "label": "Agent patched or edited the deployment",
        "validator": "pattern_match",
        "pattern": "kubectl.*(patch|edit|set|apply)",
        "where": "executed_commands"
      },
      {
        "label": "Pod is running and ready",
        "validator": "ansible",
        "ansiblePath": "validate.yml"
      }
    ]
  }
}

[k8s]
sandbox-k8s-01 ansible_host=10.0.1.30 ansible_user=ubuntu ansible_ssh_private_key_file=/etc/2501/keys/sandbox.pem

---
- name: Deploy a crashlooping workload
  hosts: k8s
  tasks:
    - name: Create production namespace
      command: kubectl create namespace production
      ignore_errors: true

    - name: Deploy api-server with a bad env var (wrong DB_HOST)
      shell: |
        kubectl apply -f - <<EOF
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: api-server
          namespace: production
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: api-server
          template:
            metadata:
              labels:
                app: api-server
            spec:
              containers:
              - name: api-server
                image: busybox
                command: ["sh", "-c"]
                args:
                  - |
                    if [ -z "$DB_HOST" ] || [ "$DB_HOST" = "CHANGEME" ]; then
                      echo "ERROR: DB_HOST is not configured" >&2
                      exit 1
                    fi
                    echo "Connected to $DB_HOST"
                    sleep infinity
                env:
                - name: DB_HOST
                  value: "CHANGEME"
        EOF

    - name: Wait for pod to enter CrashLoopBackOff
      shell: |
        for i in $(seq 1 30); do
          STATUS=$(kubectl get pods -n production -l app=api-server -o jsonpath='{.items[0].status.containerStatuses[0].state.waiting.reason}' 2>/dev/null)
          if [ "$STATUS" = "CrashLoopBackOff" ]; then exit 0; fi
          sleep 5
        done
        exit 1

---
- name: Verify api-server pod is running
  hosts: k8s
  tasks:
    - name: Wait for pod to be ready
      shell: kubectl wait --for=condition=ready pod -l app=api-server -n production --timeout=120s

    - name: Confirm pod is not in an error state
      shell: |
        STATUS=$(kubectl get pods -n production -l app=api-server -o jsonpath='{.items[0].status.phase}')
        [ "$STATUS" = "Running" ]

---
- name: Remove the test deployment
  hosts: k8s
  tasks:
    - name: Delete api-server deployment
      command: kubectl delete deployment api-server -n production
      ignore_errors: true

    - name: Delete production namespace
      command: kubectl delete namespace production
      ignore_errors: true

​Disk Full

​Nginx Broken Configuration

​Kubernetes CrashLooping Pod

Disk Full

Nginx Broken Configuration

Kubernetes CrashLooping Pod