Codex Review — 2026-04-23
1. Strengths
- The docs are unusually coherent for a zero-code project.
README.md,AGENTS.md, the PRD, and the roadmap all describe the same product and the same constraints without obvious drift. - The project has a sharp thesis: inherit from Cortex where possible, cut Docker, keep the surface small, and optimize for one user. That discipline shows up clearly in
docs/prd/001-picortex-v1.mdanddocs/plans/2026-04-23-initial-roadmap.md. - Security invariants are stated early and bluntly: backend SQLite is canonical, cross-chat actions need explicit approval, and per-chat filesystem isolation is non-negotiable (
AGENTS.md,docs/specs/001-workspace-isolation-linux-users.md). - The tmux choice is pragmatic and aligned with the actual user workflow.
docs/adrs/0003-tmux-for-session-persistence.mdis a good decision memo: clear tradeoffs, clear fallback. - The roadmap is demo-driven instead of feature-list-driven. That is the right shape for a risky systems product (
docs/plans/2026-04-23-initial-roadmap.md).
2. Risks and blind spots
- S3 is the biggest planning mistake. A single global tmux session before per-chat isolation (
docs/plans/2026-04-23-initial-roadmap.md) bakes the wrong architecture into the first real Claude integration, then forces a migration one stage later. That is avoidable churn. - The plan delays the canonical backend model too long. The PRD says SQLite is authoritative (
docs/prd/001-picortex-v1.md), but S1 only mentions echoing webhooks and does not explicitly require minimal persistent schema, idempotency keys, or event normalization. - The platform story is unresolved but the isolation spec assumes deep Linux features now:
useradd, cgroups v2,pam_namespace,iptablesowner-match (docs/specs/001-workspace-isolation-linux-users.md,docs/adrs/0002-linux-users-over-docker.md). That conflicts with “Mac Mini” still being a deployment candidate in the PRD and roadmap. - The sudo model is too loose on paper. Allowing
tmux,runuser, andbashthrough sudoers (docs/specs/001-workspace-isolation-linux-users.md) is broader than an audited entrypoint wrapper. That will be painful to harden later. - Reply capture is still speculative. The sentinel protocol in
docs/specs/002-tmux-session-spawning.mdmight work, but it is not a small assumption; it is the core turn-routing mechanism. The roadmap treats it as implementation detail, not as a risk needing proof. - Webhook correctness is under-specified. I do not see a concrete plan for duplicate delivery, replay storage, outbound retry semantics, message edits, or ordering when two inbound messages race (
docs/prd/001-picortex-v1.md,docs/plans/2026-04-23-initial-roadmap.md). - Attention gating depends on semantics not yet proven in the simulator.
mentions-onlyand reply-based triggering lean on thread/reply support, but that is deferred to S2 and lives in another repo (docs/specs/005-attention-gating.md,docs/plans/2026-04-23-initial-roadmap.md). - The isolation story still has hand-wavy gaps around egress and shell capability. “No secrets in workspace FS” is good, but
claude --dangerously-skip-permissionsplus a real shell is still a lot of power unless the wrapper and network policy are nailed down (docs/prd/001-picortex-v1.md,docs/specs/001-workspace-isolation-linux-users.md). - Warm pool in S6 looks like premature optimization. It adds lifecycle complexity before you have any latency measurements proving provisioning is the bottleneck (
docs/plans/2026-04-23-initial-roadmap.md).
3. Concrete plan changes recommended
- Delete S3 as currently written, or merge it into S4. The first time Claude Code is integrated, it should already run in the per-chat model. If that is too much for one stage, do a non-persistent
claude --printspike first, not a shared tmux architecture you already know you do not want. - Add an S1.5 focused on backend authority: define the minimal SQLite schema, inbound event normalization, replay/idempotency storage, and outbound delivery records before any Claude/session work.
- Make the host requirement explicit now. If v0.1 depends on Linux-only controls, say “Linux only” and remove Mac Mini as an equal candidate until a reduced feature set is defined.
- Add a proof stage for tmux reply capture. Build a tiny harness that runs the real Claude CLI in tmux, injects sentinels, captures logs, and measures failure modes. Do not let the main roadmap depend on an unproven parser.
- Replace the broad sudoers plan with one narrow wrapper binary or script owned by root. The backend should invoke audited subcommands, not raw
bash/tmuxcapability. - Move queueing and concurrency rules into the core plan now: one active turn per chat, explicit duplicate-event handling, and outbound retry behavior. This is message infrastructure, not cleanup work.
- Push a minimal hardening gate earlier. If bubblewrap/Landlock is truly post-v0.1, then at least require a concrete egress-deny story before group-chat execution lands.
- Drop warm pool from the baseline roadmap. Reintroduce it only if measured cold-start data says S4/S6 misses the PRD latency target.
4. Open questions before S1 starts
- Is picortex v0.1 officially Linux-only, yes or no? If yes, update the docs to stop implying parity across Hetzner and HMA.
- What exact records must exist in SQLite after S1: raw webhook payloads, normalized messages, replay cache, outbound attempts?
- What is the retry/idempotency contract with Linq for inbound and outbound traffic?
- Has anyone proven that Claude Code in tmux can be parsed reliably enough for automated reply extraction, or is that still a hypothesis?
- Is the project willing to forbid the shared-session S3 path now, before implementation momentum makes it sticky?
5. Verdict
This is a strong planning package with a real product thesis, good inheritance discipline, and better-than-average security thinking for an early personal tool. The weak point is sequencing, not intent. The current roadmap introduces the wrong session architecture too early, postpones backend-authority details that should exist from day one, and assumes Linux controls that are still not reconciled with deployment ambiguity. Fix those three things before S1 starts and the plan looks solid. Ignore them and you are likely to spend the first implementation week building scaffolding you already know you will rip out.