# Roadmap

This roadmap is derived from the current ADRs, implementation state, and validation ledger. It is ordered by architectural dependency rather than by surface-level feature appeal.

## Planning Principles

- Prove one thin vertical slice end to end before widening provider coverage.
- Keep identity, policy, and catalogs trustworthy before optimizing scale.
- Prefer validated operational loops over speculative platform breadth.
- Add live cloud automation only when cleanup and auditability are equally strong.

## Current Baseline

Already in place:

- control plane with enrollment, rotation, revocation, admin login, audit, and signed catalogs
- ingress and egress policy model
- provider-aware plan and reconcile path
- real Hetzner adapter slice
- provider policy gating for plan, smoke, and live modes
- Hetzner cleanup scaffolding
- local tracer bullet
- plan-mode Hetzner smoke tests

Current bottlenecks:

- no live provider provisioning has been validated
- no durable database backing
- no gateway bootstrap/registration handshake with the control plane
- no real client failover implementation consuming signed catalogs

## Next Execution Window

These are the next several milestones in execution order:

1. Run real Hetzner smoke create and cleanup with captured evidence.
2. Add gateway bootstrap and registration.
3. Add client catalog verification and failover selection.
4. Start the durable-state migration path.

## Phase 1: Live Hetzner Validation

Goal:

- prove one provider end to end with real resources and reversible operations

Deliverables:

- live Hetzner ingress smoke creation
- live Hetzner egress smoke creation plus primary IP allocation
- deterministic resource labels and naming
- cleanup procedure and cleanup script
- provider policy and payment-credential handling documentation
- live validation record written to `docs/validation-ledger.md`

Exit criteria:

- a live-created ingress node is provisioned successfully
- a live-created egress node and IP are provisioned successfully
- all live-created smoke resources can be enumerated and removed safely
- artifacts are captured in `data/` and summarized in docs

Validation:

- `node orchestrator/hetzner-smoke.js ingress ...`
- `node orchestrator/hetzner-smoke.js egress ...`
- cleanup command proves zero leftover smoke resources

Status:

- policy gating is implemented
- cleanup scaffolding is implemented
- live provider execution is still pending credentials and an explicit approved run

## Phase 2: Gateway Bootstrap And Registration

Goal:

- turn cloud-created nodes into service participants rather than unowned servers

Deliverables:

- gateway bootstrap identity
- gateway registration endpoint
- gateway heartbeat and status model
- signed gateway bootstrap config
- gateway config generation tied to registered gateway identity

Exit criteria:

- newly created ingress nodes can register with the control plane
- control plane can distinguish active, draining, and unhealthy gateways
- catalog publication only includes healthy registered gateways

Validation:

- tracer bullet extended to include bootstrap and registration
- unit tests for gateway registration and state transitions

## Phase 3: Real Client Failover Logic

Goal:

- move from catalog publication to actual client-side recovery behavior

Deliverables:

- client catalog fetch and signature verification
- endpoint ranking and retry logic
- failover state persisted in the client device file
- reissue logic for updated configs after key rotation

Exit criteria:

- client can verify catalog signatures offline
- client can select alternate front doors from a valid recovery kit
- failover decisions are deterministic and logged

Validation:

- client-side integration test with multiple front doors
- local simulation of a failed primary endpoint

## Phase 4: Durable State And Concurrency Safety

Goal:

- move the control plane from single-node reference persistence to an operationally credible backing store

Deliverables:

- Postgres-backed state
- migration path from JSON state
- transactional IP allocation
- transactional admin audit events
- reconciliation-safe writes

Exit criteria:

- no state mutation path depends on JSON file rewrites
- allocation and revocation are transactionally safe
- startup migrations are documented and tested

Validation:

- migration test from JSON fixture to Postgres fixture
- concurrent enrollment test

## Phase 5: Multi-Provider Reality

Goal:

- add one second real provider only after the first provider is operationally proven

Deliverables:

- DigitalOcean real adapter hardening
- region/image/size validation against live API
- provider-specific cleanup script
- provider-specific smoke evidence

Exit criteria:

- live DigitalOcean ingress smoke succeeds
- live DigitalOcean resource cleanup succeeds
- planner and reconciler can target both providers with confidence

Validation:

- live smoke commands and validation artifacts
- comparative provider notes in docs

## Phase 6: Operational Hardening

Goal:

- close the loop on security, operations, and production readiness

Deliverables:

- stronger admin bootstrap handling
- secret rotation procedures
- catalog signing key rotation procedure
- rate limiting and audit retention policy
- spend controls and provisioning guardrails
- staged rollout and drain procedures
- Observatory export and advisory quota integration

Exit criteria:

- all privileged operations have an explicit runbook
- all critical secrets have a rotation path
- rollout and rollback procedures are documented and tested

Validation:

- tabletop exercises recorded in docs
- runbook execution with captured evidence

## Sequencing Summary

Work in this order:

1. live Hetzner validation with cleanup evidence
2. gateway registration
3. client failover implementation
4. durable state migration
5. second provider live validation
6. production hardening

## Explicit Non-Goals For Now

- broad provider coverage before one provider is operationally proven
- production claims without live cleanup and validation evidence
- premature scale features before registration, failover, and durable state are solved
