# Operations Runbook

## Local Bring-Up

1. Run `npm test`.
2. Run `npm run tracer`.
3. Inspect:
   - `data/tracer-enroll-response.json`
   - `data/tracer-catalog.json`
   - `data/tracer-audit.json`
   - `data/tracer-apply-results.json`

## Full Local Validation

Run:

```bash
bash scripts/validate.sh
```

This executes:

- unit tests
- the local tracer bullet
- Hetzner smoke checks in plan mode unless `HCLOUD_TOKEN` is present
- current provider policy from `examples/provider-policy.json`

## Nix Lab Procedure

Run:

```bash
bash scripts/run-nix-lab.sh
```

Expected behavior:

- on a host with `/dev/kvm` and Nix `system-features` including `kvm`, the lab build should proceed into the NixOS VM test
- on a host without those prerequisites, the script should fail immediately with an explicit explanation

## Quality Suite

Run:

```bash
bash scripts/quality.sh
```

This executes:

- the full test suite, including integration tests
- the validation harness
- an enrollment throughput benchmark

## Admin Login Procedure

1. Start the control plane.
2. Exchange the bootstrap token for a short-lived bearer token:

```bash
curl -sS -X POST \
  -H 'content-type: application/json' \
  -d '{"bootstrapToken":"dev-admin-bootstrap-token","actor":"operator"}' \
  http://127.0.0.1:8080/v1/admin/login
```

3. Use the returned bearer token to inspect audit events:

```bash
curl -sS -H "authorization: Bearer $TOKEN" \
  http://127.0.0.1:8080/v1/admin/audit
```

## Catalog Validation Procedure

1. Fetch the catalog envelope:

```bash
curl -sS http://127.0.0.1:8080/v1/catalog?region=eu-central > data/catalog.json
```

2. Check that it contains:
   - a `catalog` object
   - a `signature.algorithm` of `ed25519`
   - a non-empty signature value
   - a public key for offline verification

3. Cross-check the catalog against the active gateway and egress state.

## Hetzner Smoke Procedure

Plan mode:

```bash
node orchestrator/hetzner-smoke.js locations --out data/hetzner-locations.json
node orchestrator/hetzner-smoke.js ingress --region eu-central --out data/hetzner-ingress-smoke.json
node orchestrator/hetzner-smoke.js egress --region eu-central --out data/hetzner-egress-smoke.json
```

Live mode:

```bash
export HCLOUD_TOKEN=...
node orchestrator/hetzner-smoke.js ingress --mode smoke --region eu-central --out data/hetzner-ingress-live.json
node orchestrator/hetzner-smoke.js egress --mode smoke --region eu-central --out data/hetzner-egress-live.json
```

Protected live mode:

```bash
export HCLOUD_TOKEN=...
node orchestrator/hetzner-smoke.js ingress --mode live --approve-live true --region eu-central --out data/hetzner-ingress-prod-like.json
```

## Live Test Preconditions

Before running live cloud smoke tests:

- confirm provider token scope
- confirm naming and labels for cleanup
- confirm allowed spend
- confirm target region mapping
- confirm cleanup procedure exists before provisioning
- confirm `examples/provider-policy.json` permits the intended provider, region, and size

## Payment Credential Procedure

Follow the model in [payment-credentials.md](/home/uprootiny/damm/docs/payment-credentials.md):

1. billing account remains human- or org-owned at the provider
2. create a scoped provider API token for DAMM
3. inject that token only at runtime
4. enforce policy and approval checks before live provisioning

## Cleanup Standard

Any live smoke-created resource should include:

- a deterministic name prefix such as `smoke-`
- labels for `managed-by`, `lifecycle`, `role`, and `region`
- a recorded output file in `data/`

Enumerate or execute Hetzner cleanup with:

```bash
node orchestrator/hetzner-cleanup.js --region eu-central --out data/hetzner-cleanup.json
export HCLOUD_TOKEN=...
node orchestrator/hetzner-cleanup.js --region eu-central --execute true --out data/hetzner-cleanup-executed.json
```

## Benchmark Procedure

Run:

```bash
node scripts/benchmark-enroll.js --count 25 --concurrency 5 --out data/benchmark-enroll.json
```

Expected output:

- total duration
- average milliseconds per enrollment
- enrollments per second

Interpretation:

- this is an application-level tracer benchmark, not a transport benchmark
- it measures control-plane enrollment behavior only