Spec 008 — Observability

Status: Draft Related: PRD FR-22..FR-25, Jacob's global dev-patterns

Goal

Everything observable without standing up a paid SaaS. Logs useful for the user ("what happened in chat X last Tuesday?") and for the developer ("why did the discriminator skip this message?").

Structured logs

  • Logger: pino with pino-pretty in dev, JSON in prod.
  • Level: default info; debug via LOG_LEVEL=debug.
  • Location: stdout (journald / docker-less systemd captures); no separate log files.
  • Fields every log carries:
    • time (ISO)
    • level (info/warn/error)
    • request_id (see below)
    • chat_id (if in chat context)
    • event_type (e.g. linq.inbound, tmux.turn.start, discriminator.decision)
    • msg (free text)

Request IDs

  • Fastify middleware generates X-Request-ID (uuid v7) for every inbound HTTP request.
  • Response headers echo it.
  • Logs in that request's async context include it.
  • Child-process spawns inherit it via env (PICORTEX_REQUEST_ID).
  • Linq inbound events tag the request ID into the events SQLite row.

/api/frontend-log

Per Jacob's global rules. Client-side:

window.addEventListener('error', ev => fetch('/api/frontend-log', {
  method: 'POST',
  body: JSON.stringify({
    level: 'error',
    message: ev.message,
    error: ev.error?.toString(),
    stack: ev.error?.stack,
    context: { url: location.href, ua: navigator.userAgent, build: __VERSION__ }
  })
}))

Server-side endpoint:

  • Accepts up to FRONTEND_LOG_MAX_BYTES (default 64 KB)
  • Rate-limited to 30/min per IP
  • Logs under event_type: "frontend" with the browser-supplied fields plus the request ID tying it to the current user session

Metrics

No Prometheus in v1. Instead, lightweight counters in SQLite metrics table that /health exposes:

chats_total
chats_active_7d
turns_total
turns_last_24h
discriminator_skipped_24h
errors_last_24h

/health returns:

{
  "status": "ok",
  "version": "0.0.1",
  "commit": "abcd123",
  "uptime_seconds": 3412,
  "db_ok": true,
  "tmux_ok": true,
  "metrics": { ... }
}

Network egress allowlist

Claude Code chat users should only reach:

  • api.anthropic.com
  • registry.npmjs.org (for tooling, if used by Claude)
  • pypi.org (if Python is used)
  • github.com, raw.githubusercontent.com
  • Anything the user explicitly allowlists in /etc/picortex/egress-allowlist.txt

Enforced via iptables owner match on the chat-user's UID. Rejected connections log an event — Jacob gets an alert if a new host is attempted (learning mode).

Sentry (optional, post-v0.1)

If Jacob wants error aggregation: @sentry/node + @sentry/browser. Keep it off by default.

Testing

  • Unit: request-ID middleware; log shape sanity.
  • Integration: frontend-log roundtrip.
  • Manual: tail logs during E2E; verify every turn has a request ID.

Open questions

  • OQ1: Where are logs archived long-term? (Not in v1 — stdout + journald is fine.)
  • OQ2: Do we want Axiom or Loki integration? (Not for v1. Cortex uses Axiom.)
[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?