# DAMM Issue Triage

Compacted register of every open concern across the doc set, categorized by urgency. Sourced from `architecture-premortem.md`, `sanity-check-2026-04-28.md`, `field-manual.md` brainstorm, and live verify-suite findings.

Each issue has: a tag, a one-sentence what, the impact level, the doc/section that owns it, and where it sits in the roadmap.

Refreshed: 2026-04-29.

## Categories

- **🔥 BLOCKING** — actively breaks user-visible functionality, or risks the bedrock fact. Drop everything.
- **❗ IMPORTANT** — known degradation, will bite if left. Schedule into the next release.
- **○ NICE-TO-HAVE** — improvement, not a fix. Pull when convenient.
- **✓ RESOLVED** — closed in a recent release; kept here for traceability for one cycle, then moved to `field-manual.md` postmortem.
- **✗ ABANDONED** — explicitly not pursuing; rationale recorded.

## 🔥 Blocking — none currently

The bedrock fact (332MB rx / 193MB tx through one peer) is intact through v0.3.2. The verify suite is green. No blocking issues.

## ❗ Important — schedule into v0.3.x or v0.4.0

| Tag | What | Owner doc | Roadmap |
|---|---|---|---|
| `catalog-no-filtering` | Catalog publishes any registered gateway regardless of score; resilience claim hollow | sanity-check §3.10, field-manual §3.10 | **v0.3.3** |
| `wizard-no-qr` | Wizard renders config text + Download but no QR; iOS friends struggle to import | field-manual story F | **v0.3.4** |
| `t0-only` | Only bare WireGuard/UDP/51820 is offered; many networks block it; no T1+ deployed | field-manual §3.7, story A | **v0.3.5** |
| `captive-portal-opaque` | Wizard's first network call fails opaquely on coffee-shop networks; user has no useful guidance | field-manual story F | **v0.3.6** |
| `state-monolithic` | state.json mixes durable + hot data; every heartbeat rewrites the whole file; will limit scale | premortem §1, sanity-check §4 | **v0.4.0** |
| `single-provider` | Only Hetzner adapter has a live API integration; multi-provider claim hollow until v0.4 | field-manual §3.9, coordination-layer §3.2 | **v0.4.0** |
| `signing-key-spof` | Catalog signing key in state.json with no rotation protocol; loss = re-enrollment for everyone | premortem §4 | post-v0.4 |
| `no-knock-port` | UDP/51820 always answers from any source; ISPs scanning find us trivially | conversation, field-manual §3.6 | post-v0.4 |
| `egress-mono` | One egress pool, no quarantine; "sequester" semantics promised but unbuilt | field-manual §3.8 | post-v0.4 |

## ○ Nice-to-have — pull when convenient

| Tag | What | Owner doc |
|---|---|---|
| `hsts-explicit` | HSTS header not explicitly set on cp.damm; relies on Caddy default | sanity-check §C.4 |
| `caddy-upstream-health` | Caddy proxies to CP without health-check; a hung CP returns 504 not 502 | sanity-check §C.5 |
| `rate-limit-multi-process` | Rate limiter is in-process memory; would need shared store for multi-CP | field-manual §3.14 |
| `wizard-deep-link-ios` | iOS WireGuard accepts `wireguard://` URL scheme; we don't generate one | field-manual story F |
| `request-log-rotation` | We logrotate via copytruncate; a finer cut would split by hour | sanity-check §C.10 |
| `orchestrator-on-cp-host` | Orchestrator runs on hub2 alongside CP; later it should be its own host | premortem §2 |
| `companion-overlap` | `/docs/damm-client.html` and `/get/` overlap; one should be retired or rescoped | sanity-check §10 |

## ✗ Abandoned — explicit non-decisions

| Tag | What | Why we're not doing it |
|---|---|---|
| `super-light-crypto` | Replacing ChaCha20-Poly1305 with cheaper AEAD | False economy. Already cheap; weakening the floor for invisible savings. |
| `trustless-catalog` | Publishing catalogs through public read-only storage instead of HTTPS | Latency to fetch is too high; revocation gets hard. |
| `stateless-tickets` | Replacing peer state with admission tickets verified at handshake | Genuinely competitive but a different product. Off-the-shelf WG client is the commitment. |
| `mdns-mesh-stock-ios` | Phone-to-phone mesh via mDNS through stock iOS WireGuard | iOS sandboxes mDNS in ways that block this. Briar/Bridgefy do it differently because they have to. |

## ✓ Resolved (kept for one cycle, then move to postmortem)

| Tag | What | Resolved in |
|---|---|---|
| `state-grows-unbounded` | Audit log + enrollment passes accumulated forever | v0.2.2 (zombie sweeper trim) |
| `cp-no-request-log` | Control plane log only emitted "listening on..." per restart | v0.2.2 (per-request JSON Lines) |
| `wizard-many-step` | First wizard had a 4-step flow with too many user choices | v0.2.4+ (collapsed to single click after platform-detect) |
| `coordination-layer-narrow` | Provider interface was compute-only | v0.2.4 (capabilities widened to overlay/cdn_front/tunnel_term) |
| `orchestrator-cli-only` | Orchestrator was scripts; not running anywhere | v0.3.0 (`damm-orchestrator.timer`, scores.json, public health endpoint) |
| `admission-too-tight` | Per-IP rate limit at 6/h broke verify-suite + would break real users running through the wizard a few times | v0.3.2 (bumped to 60/h per IP, 240/h per /24, 2400/h global) |
| `verify-state-jq-bug` | state-integrity.sh failed because of jq operator-precedence bug | v0.3.2 (parens around tuple elements) |
| `verify-ssh-self` | Verify scripts SSH'd from hub2 to hub2; produced noise + identity errors | v0.3.2 (`DAMM_LOCAL=1` flag) |
| `wizard-flow-race` | Verify wizard-flow.sh raced reconciler with 3s sleep; flaky | v0.3.2 (poll up to 8s) |
| `docs-fall-through` | All `.md` docs returned the homepage payload because not registered in build-site.js | v0.3.1 (registered the eight docs) |

These move to `field-manual.md` §2 (postmortem) on the next docs-hygiene pass.

## How to use this triage

When picking a piece of work:
1. Look at **Important** first. Pick by roadmap position.
2. Don't pick from **Nice-to-have** while **Important** items remain unstarted in the current release.
3. **Abandoned** is for stating "we considered this and chose not to" — use to push back on requests to revisit.
4. **Resolved** is the audit trail. Keep it for one release cycle so the change is traceable.

When closing an issue:
1. Move the row to **Resolved** with the resolving release.
2. Update the relevant doc (per `INDEX.md` rule 1, including its "Last verified" date).
3. Add a verify-suite check if the issue could regress.

When discovering a new issue:
1. Tag it (lowercase, hyphenated, one-token-ish).
2. Place in the right category. **🔥 Blocking** is reserved for real production-breaking issues.
3. Cross-reference the doc that owns the concern.
4. Update `INDEX.md` if a new canonical doc was created to address it.

## The grounding rule

This file does not duplicate the field manual or sanity check. It **routes** to them. If you find yourself describing an issue at length here, that description belongs in the canonical doc; this file should be a one-line pointer.

When this triage doc grows past ~200 rows, that's a signal we're doing too much in flight; ship and close before adding more.
