Spec 003 — Web terminal (xterm.js)
Status: Draft Related: PRD FR-12, Spec 002
Goal
Any authenticated user can open a web terminal in a chat and see/control its live tmux session (attached to Claude Code). Works on mobile Safari.
Architecture
[Browser] [Backend] [Chat user]
xterm.js <-- WS --> /ws/terminal/:chat_id --> runuser + tmux attach
|
+--> node-pty
Endpoint
GET /ws/terminal/:chat_id — upgrades to WebSocket. Authorization checked before upgrade:
- Cookie-based Noos OAuth session
- User must own the chat (v1: always Jacob)
On connection:
- Spawn
sudo -u chat-$HEX -H tmux attach -t picortex:$CHAT_IDundernode-pty - Wire: stdin from WS text frames → pty; pty stdout → WS binary frames (base64 optional)
- Handle
resizemessage type from client:{cols, rows}→pty.resize+tmux refresh-client -S
Close path: client disconnect → pty.kill('SIGHUP') (this detaches from tmux without killing the session).
Client
@xterm/xtermv5+- Addons:
@xterm/addon-fit,@xterm/addon-web-links - Touch keyboard support: a small toolbar with ⌃ / Tab / Esc / ↑ / ↓ buttons for mobile
Security
- Read-write for v1 (Jacob is the only user).
- Post-v1: add a
mode=readonlyquery param that startstmux attach -r. - Do not accept arbitrary commands from WS — it's pure PTY bytes. No JSON/RPC layer.
- Rate-limit connections to 5/sec per user to prevent tmux-spam DoS.
- WebSocket path is under the same origin + cookie as the main UI — CSRF safe.
Resize protocol
Client sends a JSON control frame (distinct from PTY data frames):
{"type": "resize", "cols": 120, "rows": 40}
Data frames are plain text (UTF-8) for client→server, binary (Uint8Array) for server→client.
Mobile considerations
- Font: SF Mono, 13px on mobile
- Lock horizontal scroll; let tmux handle horizontal overflow
- Swipe-left from terminal snaps back to file browser pane
- Copy-on-select with a long-press menu
Testing
- Unit: protocol framing (resize, data, ping).
- Integration: real tmux behind real WS; type
echo hiand assert output. - E2E: Playwright on mobile Safari viewport; attach, type, detach, re-attach (see same scrollback).
Open questions
- OQ1: How to detect "user typed a meaningful command" vs "user is just looking"? (For activity tracking / lifecycle.) Answer: any stdin byte counts.
- OQ2: Do we want to log every keystroke for audit? Probably no in v1 (privacy). Log session open/close only.