# Engineering Decisions

## Goal

This file explains why the code is shaped the way it is now. It is not an abstract essay. It is a justification for current choices.

## 1. WireGuard Instead Of A Custom Tunnel Stack

Choice:

- use WireGuard as the packet transport

Why:

- small protocol surface
- mature Linux support
- straightforward key model
- easy to validate with real host smokes

Tradeoff:

- less flexibility than a custom or multi-transport tunnel stack
- some platforms need userspace support or extra packaging

Resolution:

- keep transport boring
- put flexibility into catalogs, policy, provisioning, and placement

## 2. Control Plane Separate From Data Plane

Choice:

- keep enrollment, policy, metering, and reporting in the control plane
- keep packet forwarding in the gateway

Why:

- reduces blast radius of control-plane mistakes
- keeps latency-sensitive forwarding logic simple
- makes policy and audit easier to reason about

Tradeoff:

- requires a clean contract between the API and the gateway
- some operator features need extra propagation steps

Resolution:

- accept the extra coordination cost
- prefer a crisp boundary over a “smart gateway” blob

## 3. Client-Side Private Keys

Choice:

- generate and keep device private keys on the client only

Why:

- clean trust boundary
- avoids a class of server-side key escrow mistakes
- aligns with the intended ownership model of a VPN client identity

Tradeoff:

- reissue and device recovery flows are more constrained
- the client has to render its own local config

Resolution:

- take the cleaner boundary
- accept slightly more client logic

## 4. Signed Catalogs For Failover

Choice:

- publish a signed catalog and make the client verify it before switching endpoints

Why:

- lets reachability change while identity remains stable
- gives the client an offline-verifiable trust anchor
- keeps failover logic deterministic and inspectable

Tradeoff:

- requires key distribution and freshness checks
- catalog design becomes load-bearing

Resolution:

- use Ed25519 signing
- reject stale catalogs
- keep the catalog structure simple

## 5. Serialized JSON State Before Full Relational Migration

Choice:

- keep the JSON backend, but route mutations through a store abstraction

Why:

- fastest path to improved correctness without stopping repo progress
- good enough for single-process tracer and smoke work
- provides a migration seam for Postgres

Tradeoff:

- still not sufficient for real distributed concurrency
- still a coarse-grained snapshot model

Resolution:

- treat JSON as an MVP backend only
- keep Postgres integration on a dedicated hardening path

## 6. Tier Policy And Metering In The Control Plane

Choice:

- represent access tiers and device usage in control-plane state

Why:

- gives the product a concrete policy model
- allows quotas and suspensions to be visible in report output
- creates a straightforward path to billing or operator enforcement later

Tradeoff:

- current metering is only as good as the gateway-reported numbers
- usage enforcement is not yet tied to live packet-plane disconnect logic

Resolution:

- accept reported metering as an MVP
- surface the limitation plainly in docs

## 7. Static Generated Operator Pages Instead Of A Heavy SPA

Choice:

- generate site pages from repo data and artifacts

Why:

- low dependency surface
- easy to publish and verify
- keeps the site honest because it is derived from repo artifacts

Tradeoff:

- less interactive than a full app
- no live polling without additional client code

Resolution:

- use generated pages for now
- only add dynamic behavior when there is a concrete operational need

## 8. Explicit Failure Responses Instead Of Generic 500s

Choice:

- return deliberate `503`, `409`, `403`, and `413` responses for capacity, lifecycle, policy, and request-shape failures

Why:

- operators need actionable failure modes
- tests should prove service semantics, not just “something failed”

Tradeoff:

- more error-mapping code
- more integration tests to maintain

Resolution:

- keep the mapping small and explicit
- prefer a little repetition over hidden exception magic
