Spec 008 — Observability

Status: Draft Related: PRD FR-22..FR-25, Jacob's global dev-patterns

Goal

Everything observable without standing up a paid SaaS. Logs useful for the user ("what happened in chat X last Tuesday?") and for the developer ("why did the discriminator skip this message?").

Structured logs

Logger: pino with pino-pretty in dev, JSON in prod.
Level: default info; debug via LOG_LEVEL=debug.
Location: stdout (journald / docker-less systemd captures); no separate log files.
Fields every log carries:
- time (ISO)
- level (info/warn/error)
- request_id (see below)
- chat_id (if in chat context)
- event_type (e.g. linq.inbound, tmux.turn.start, discriminator.decision)
- msg (free text)

Request IDs

Fastify middleware generates X-Request-ID (uuid v7) for every inbound HTTP request.
Response headers echo it.
Logs in that request's async context include it.
Child-process spawns inherit it via env (PICORTEX_REQUEST_ID).
Linq inbound events tag the request ID into the events SQLite row.

`/api/frontend-log`

Per Jacob's global rules. Client-side:

window.addEventListener('error', ev => fetch('/api/frontend-log', {
  method: 'POST',
  body: JSON.stringify({
    level: 'error',
    message: ev.message,
    error: ev.error?.toString(),
    stack: ev.error?.stack,
    context: { url: location.href, ua: navigator.userAgent, build: __VERSION__ }
  })
}))

Server-side endpoint:

Accepts up to FRONTEND_LOG_MAX_BYTES (default 64 KB)
Rate-limited to 30/min per IP
Logs under event_type: "frontend" with the browser-supplied fields plus the request ID tying it to the current user session

Metrics

No Prometheus in v1. Instead, lightweight counters in SQLite metrics table that /health exposes:

chats_total
chats_active_7d
turns_total
turns_last_24h
discriminator_skipped_24h
errors_last_24h

/health returns:

{
  "status": "ok",
  "version": "0.0.1",
  "commit": "abcd123",
  "uptime_seconds": 3412,
  "db_ok": true,
  "tmux_ok": true,
  "metrics": { ... }
}

Network egress allowlist

Claude Code chat users should only reach:

api.anthropic.com
registry.npmjs.org (for tooling, if used by Claude)
pypi.org (if Python is used)
github.com, raw.githubusercontent.com
Anything the user explicitly allowlists in /etc/picortex/egress-allowlist.txt

Enforced via iptables owner match on the chat-user's UID. Rejected connections log an event — Jacob gets an alert if a new host is attempted (learning mode).

Sentry (optional, post-v0.1)

If Jacob wants error aggregation: @sentry/node + @sentry/browser. Keep it off by default.

Testing

Unit: request-ID middleware; log shape sanity.
Integration: frontend-log roundtrip.
Manual: tail logs during E2E; verify every turn has a request ID.

Open questions

OQ1: Where are logs archived long-term? (Not in v1 — stdout + journald is fine.)
OQ2: Do we want Axiom or Loki integration? (Not for v1. Cortex uses Axiom.)