Spec 003 — Web terminal (xterm.js)

Status: Draft Related: PRD FR-12, Spec 002

Goal

Any authenticated user can open a web terminal in a chat and see/control its live tmux session (attached to Claude Code). Works on mobile Safari.

Architecture

[Browser]              [Backend]              [Chat user]
xterm.js  <-- WS -->   /ws/terminal/:chat_id  --> runuser + tmux attach
                                                  |
                                                  +--> node-pty

Endpoint

GET /ws/terminal/:chat_id — upgrades to WebSocket. Authorization checked before upgrade:

  • Cookie-based Noos OAuth session
  • User must own the chat (v1: always Jacob)

On connection:

  1. Spawn sudo -u chat-$HEX -H tmux attach -t picortex:$CHAT_ID under node-pty
  2. Wire: stdin from WS text frames → pty; pty stdout → WS binary frames (base64 optional)
  3. Handle resize message type from client: {cols, rows}pty.resize + tmux refresh-client -S

Close path: client disconnect → pty.kill('SIGHUP') (this detaches from tmux without killing the session).

Client

  • @xterm/xterm v5+
  • Addons: @xterm/addon-fit, @xterm/addon-web-links
  • Touch keyboard support: a small toolbar with ⌃ / Tab / Esc / ↑ / ↓ buttons for mobile

Security

  • Read-write for v1 (Jacob is the only user).
  • Post-v1: add a mode=readonly query param that starts tmux attach -r.
  • Do not accept arbitrary commands from WS — it's pure PTY bytes. No JSON/RPC layer.
  • Rate-limit connections to 5/sec per user to prevent tmux-spam DoS.
  • WebSocket path is under the same origin + cookie as the main UI — CSRF safe.

Resize protocol

Client sends a JSON control frame (distinct from PTY data frames):

{"type": "resize", "cols": 120, "rows": 40}

Data frames are plain text (UTF-8) for client→server, binary (Uint8Array) for server→client.

Mobile considerations

  • Font: SF Mono, 13px on mobile
  • Lock horizontal scroll; let tmux handle horizontal overflow
  • Swipe-left from terminal snaps back to file browser pane
  • Copy-on-select with a long-press menu

Testing

  • Unit: protocol framing (resize, data, ping).
  • Integration: real tmux behind real WS; type echo hi and assert output.
  • E2E: Playwright on mobile Safari viewport; attach, type, detach, re-attach (see same scrollback).

Open questions

  • OQ1: How to detect "user typed a meaningful command" vs "user is just looking"? (For activity tracking / lifecycle.) Answer: any stdin byte counts.
  • OQ2: Do we want to log every keystroke for audit? Probably no in v1 (privacy). Log session open/close only.
[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?