Spec 008 — Observability
Status: Draft Related: PRD FR-22..FR-25, Jacob's global dev-patterns
Goal
Everything observable without standing up a paid SaaS. Logs useful for the user ("what happened in chat X last Tuesday?") and for the developer ("why did the discriminator skip this message?").
Structured logs
- Logger:
pinowithpino-prettyin dev, JSON in prod. - Level: default
info;debugviaLOG_LEVEL=debug. - Location: stdout (journald / docker-less systemd captures); no separate log files.
- Fields every log carries:
time(ISO)level(info/warn/error)request_id(see below)chat_id(if in chat context)event_type(e.g.linq.inbound,tmux.turn.start,discriminator.decision)msg(free text)
Request IDs
- Fastify middleware generates
X-Request-ID(uuid v7) for every inbound HTTP request. - Response headers echo it.
- Logs in that request's async context include it.
- Child-process spawns inherit it via env (
PICORTEX_REQUEST_ID). - Linq inbound events tag the request ID into the
eventsSQLite row.
/api/frontend-log
Per Jacob's global rules. Client-side:
window.addEventListener('error', ev => fetch('/api/frontend-log', {
method: 'POST',
body: JSON.stringify({
level: 'error',
message: ev.message,
error: ev.error?.toString(),
stack: ev.error?.stack,
context: { url: location.href, ua: navigator.userAgent, build: __VERSION__ }
})
}))
Server-side endpoint:
- Accepts up to
FRONTEND_LOG_MAX_BYTES(default 64 KB) - Rate-limited to 30/min per IP
- Logs under
event_type: "frontend"with the browser-supplied fields plus the request ID tying it to the current user session
Metrics
No Prometheus in v1. Instead, lightweight counters in SQLite metrics table that /health exposes:
chats_total
chats_active_7d
turns_total
turns_last_24h
discriminator_skipped_24h
errors_last_24h
/health returns:
{
"status": "ok",
"version": "0.0.1",
"commit": "abcd123",
"uptime_seconds": 3412,
"db_ok": true,
"tmux_ok": true,
"metrics": { ... }
}
Network egress allowlist
Claude Code chat users should only reach:
api.anthropic.comregistry.npmjs.org(for tooling, if used by Claude)pypi.org(if Python is used)github.com,raw.githubusercontent.com- Anything the user explicitly allowlists in
/etc/picortex/egress-allowlist.txt
Enforced via iptables owner match on the chat-user's UID. Rejected connections log an event — Jacob gets an alert if a new host is attempted (learning mode).
Sentry (optional, post-v0.1)
If Jacob wants error aggregation: @sentry/node + @sentry/browser. Keep it off by default.
Testing
- Unit: request-ID middleware; log shape sanity.
- Integration: frontend-log roundtrip.
- Manual: tail logs during E2E; verify every turn has a request ID.
Open questions
- OQ1: Where are logs archived long-term? (Not in v1 — stdout + journald is fine.)
- OQ2: Do we want Axiom or Loki integration? (Not for v1. Cortex uses Axiom.)