AINA · Factory Ops Review Checkpoint · Frozen 2026-07-01

The Factory Came Alive — Then We Stopped to Get the Mechanics Right

A phased closeout of a long session: what was proven, where it drifted, what's preserved, and the clean prompts to resume each lane in its own chat.

Ali Mehdi Mukadam · co-authored with Claude · 2026-07-01 · ~7 min read · source: EXECUTION-STATE-2026-06-30.md

The Single Idea

Paperclip can run the factory by itself — assignment natively wakes agents, heads route work to their teams, verifiers auto-wake, and a native interval-heartbeat replaces every cron. It's proven. We paused before wiring the last piece to answer one design question: whose job is it to keep the lanes fed — the CEO, or a dedicated keeper?

Nothing is being changed while you review this. The one open decision you raised — "is it the CEO's job?" — is captured, not answered. Everything discussed, built, or shared is preserved (§6). This document exists so no thread has to live in one giant chat: each unfinished lane gets a self-contained prompt (§7).

The meta-lesson of this session

Twice I deepened a wrong model (first Mergify machinery, then poking seven heads via cron) instead of questioning the premise. Both times you caught it, and both times the fix was simpler and more native, not more machinery. That pattern is the thing to not repeat.

01 Frozen state — live / paused / pending 02 Phases & where it drifted 03 Major topics & learnings 04 Gotchas — don't repeat these 05 Incomplete lanes 06 Preservation index 07 Prompts for separate chats 08 Recommended sequence

SECTION 01

Frozen state — what's live, paused, pending

Thing	State
Native self-drive (assign→wake→build→handoff→verify)	PROVEN gpt-5.5/codex, zero Claude tokens
Internal heartbeat scheduler	WORKING Jessica enabled=true, intervalSec=1800, no cron
External cron doorbell	REMOVED you rejected external tooling
Jessica coordinator routine	PROPOSED — NOT applied (you stopped me)
Jessica's repo/release/fusion skills	Trimmed off (7) — coordination-only for now; rollback saved
Content lane	Ran real work, now quiescent
Preservation / your design system	COMPLETE · aina-design repo created
Hermes bridge / Docker / old drift machinery	Running / parked / paused (not yet retired)

SECTION 02

The phases, and where it drifted

Eleven phases across the session. The drift was concentrated in one place — the per-task GitHub apparatus — and the correction came from you, twice.

The drift

Per-task commit → push → PR → CI → bot-review, on every task. Bot-fix churn climbed 3 → 61 in a single day. I spent hours deepening it with Mergify, merge-train, release-marshal.

The native model

Dev work stays internal; git happens only at milestones, via Frodo; main = deploy source. Agents coordinate through Paperclip's own assign/wake/handoff. No bots, no per-task roundtrip.

Phase	What happened
1 · Docker + PKM groundwork	Images built, guardrails validated, parked	clean
2 · Mergify saga	Adopted → inert → blocked the merger → report-only	drift
3 · The reframe	"GitHub-per-task is the drift" — root cause verified	corrected
4 · Teardown + design	Paused machinery; wrote locked design + plan	clean
5 · U2 self-drive proof	AIN-95 built + verified autonomously, producer≠verifier	the crux
6 · Isolation dead-end	git_worktree policy didn't isolate (agent cwd)	dead-end
7 · Per-team workspaces	Your call: teams already have own git folders	simplified
8 · Preservation sweep	549 lines rescued, all branches mirrored	big save
9 · Turn-on	Head-routing proven; heartbeat scheduler found	proven
10 · Doorbell → internal	Cron rejected → native intervalSec on Jessica	corrected
11 · Scope + checkpoint	"Not in repos yet"; "is it the CEO's job?" → freeze	here

SECTION 03

Major topics & durable learnings

The native flow, drawn as it actually runs — one scheduled coordinator ticking the org, everything below it woken by assignment, not by a timer:

Native loop. Only the CEO is on a clock; everything below wakes on assignment. Frodo's milestone box (U6) is the one piece not yet wired.

T1 · Paperclip has an interval-gated heartbeat scheduler — not cron

Assignment auto-wakes the assignee natively. Periodic self-firing needs heartbeat.enabled=true and intervalSec>0 — then schedulerActive flips true. Enabling alone does nothing. There is no routine/trigger/cron CLI primitive — so don't build one.

T2 · Head-driven routing works, via assignment not @mention

A head, when woken, surveys its team-goal and assigns real backlog to members (Monica routed four issues, correctly skipping the founder-decision one). Agents engage each other by assigning issues — direct peer wake/comment is auth-restricted, and there is no free-text mention-wake.

T3 · Per-team git workspaces (your locked call)

Each lane already owns its own git folder. Per-issue worktree isolation was over-engineering; instance-level isolated-workspaces is off anyway. Product-repo work is deferred — this phase is prep, planning, specs, content.

"Poke a head → the head surveys its lane → it assigns to its team → members wake and build. That's the whole factory, on native mechanics."

SECTION 04

Gotchas — carry these into every new chat

Heartbeat needs two flags
enabled=true AND intervalSec>0. Enabled-only is a silent no-op.

No external cron
Scheduling is native (intervalSec). Building a cron doorbell was a wasted cycle.

-C flag is inconsistent
Needed for agent/issue list + project get/update. NOT for agent/issue get/update (IDs are global).

Nested heredoc mangles quotes
Write scripts via base64-decode or use jq — inline Python quotes get stripped.

Verify the real entry point
Config "persisted" ≠ behavior changed. The marker test caught the isolation no-op.

echo pollutes JSON pipes
Capture --json to a file, parse separately. Don't echo into a parser.

Don't trust status:done
Read the actual work. A stale prior-day comment looked like this run's verdict.

DER can't use GitHub
100MB+ data files. Preserved via git bundle + R2 — don't "fix" with force-push.

SECTION 05

Incomplete lanes we opened

Lane	State	Blocking question
A · CEO-vs-keeper	Jessica scheduled; routine not applied	CEO's job, or a dedicated Atlas keeper?
B · Remaining dev lanes	Only content proven	Depends on A
C · Frodo release runbook (U6)	Not built	The durable-git path; gates retirement
D · Retire drift scaffolding (U4)	Paused, Hermes still up	Only after C proven
E · Repo-implementation phase	Deferred by you	When do agents build in product repos?
F · Rejected + never-built backlog	~20 + ~48 waiting	Build metered under native model
H · PKM nightly → VDS	Groundwork only	Separate track; laptop still producer

SECTION 06

Preservation index — nothing lost

Decisions, design and plan are on GitHub (oscalar/pkm-monorepo). All VDS branches mirrored to vds-preserve-20260630/*; stranded code committed; stashes tagged. Data-engine-room preserved via full bundle + R2 restic (GitHub can't hold its 100MB+ files). The design system you shared is now the private repo ainative-academy/aina-design. The full chronology lives in EXECUTION-STATE-2026-06-30.md. Because everything is snapshotted, the eventual cleanup of the 347 stale branches is finally safe.

SECTION 07

Prompts for separate chats

Each prompt is self-contained — prepend the standing-context block, then paste. Keeping one lane per chat avoids the failure mode this very session hit: one giant thread that accumulates drift.

Standing context · prepend to every prompt below

AINA agent factory runs on Paperclip (company 7d58fc13-7c9a-4c87-b9f8-a7cfae8564a9) on VDS aina-vds-tf.
Access: ~/PKM/scratch/pkm-deepdive/vds-exec.sh run '<cmd>' (reads) / runl (writes). CLI = paperclipai.
Full record: ~/PKM/scratch/pkm-deepdive/SESSION-CLOSEOUT-2026-07-01.md — read the learnings + gotchas FIRST.
Gotchas: (1) native heartbeat needs enabled=true AND intervalSec>0 — no external cron.
(2) -C flag: needed for agent/issue LIST + project get/update; NOT for agent/issue GET/UPDATE.
(3) write scripts via base64, not nested heredoc. (4) verify the real entry point, don't trust status.
(5) NOT doing product-repo implementation yet — prep/planning/specs/content only.
Everything is preserved (vds-preserve-20260630/* + R2). Agents = gpt-5.5/codex, zero Claude tokens.

watch — a chat that skips the gotchas will re-make the exact mistakes this session made.

Claude Code · Prompt A · Decide the coordinator — do NOT edit other agents

[STANDING CONTEXT]
Decide whether the periodic "survey ready work → wake the right heads" duty belongs to Jessica
(CEO, 6454b8e0, intervalSec=1800) or a dedicated keeper/Atlas (does not exist yet). Jessica already
knows HOW to assign+wake heads; she just isn't instructed to do it every heartbeat.
Deliver: (1) recommend CEO-coordinator vs dedicated-keeper with tradeoffs; (2) if keeper, design the
minimal Atlas agent; (3) draft the exact heartbeat-routine instruction (route PREP work only — no repo
impl); (4) apply to ONE agent, fire it, and VERIFY it actually wakes a head with real work.

watch — don't quietly make it the CEO's job; that's the open decision. Verify the cascade, don't assume it.

Claude Code · Prompt B · Turn on remaining dev lanes — after A

[STANDING CONTEXT]
Coordination model is decided (Prompt A). Bring platform (Richard be6cc169), data (Laurie af273e31),
and agentops (Jared a873590c) lanes online the same way content was proven. Fire once, verify each head
surveys its team-goal and routes real prep work, members auto-wake. Heads have maxConcurrentRuns=1 —
watch for cost/pile-up. Report which lanes self-drove.

watch — don't flip all lanes blind; prove one head cascades before enabling the rest.

Claude Code · Prompt C · Build Frodo's milestone release runbook (U6)

[STANDING CONTEXT]
Build the release/GitOps path owned by Frodo (88b49386, devops, idle). Model: dev work stays internal
(no per-task git); at a MILESTONE Frodo integrates the lane's verified work, runs one CI/preview pass,
promotes dev→main (deploy source), and posts a Cloudflare wrangler PREVIEW URL to Ali. This is the
durable-git path — prerequisite for retiring the safety nets. Design + wire Frodo's runbook; run one
simulated milestone. Mergify (if wanted) lives HERE, milestone-scoped, never per-task.

watch — do NOT reintroduce per-task PRs/CI. Git happens at milestones only.

Claude Code · Prompt D · Retire drift scaffolding (U4) — ONLY after C

[STANDING CONTEXT]
Native flow + Frodo release are proven. Permanently retire (archive, not just pause): COO loop, watchdog,
release-marshal, merge-train, Mergify-queue, pr-bot-review-watcher, rollout-health-monitor, and the
Hermes↔Paperclip bridge (still running). Verify a full lane cycle completes AFTER removal. Everything is
preserved to vds-preserve-20260630/* so GC is safe. Clean up 347 stale branches + coo-* worktrees LAST,
with a manifest.

watch — do not GC anything until you've confirmed it exists in vds-preserve-20260630/* or R2.

Claude Code · Prompt E · Open the repo-implementation gate — when Ali says go

[STANDING CONTEXT]
Ali has decided agents can now work in the product repos (was deferred). Re-add repo/release skills to
the relevant agents (rollback list: ROLLBACK-jessica-skills.json). Product-repo writes go through Frodo's
milestone flow, not per-task. Prove one lane builds real product code → Frodo milestone → preview URL,
with no branch cross-contamination (per-team folders confirmed).

watch — this gate is Ali's to open. Don't re-enable repo work before he says so.

Claude Code · Prompt F · Metered build of the backlog

[STANDING CONTEXT]
~20 Gimli-rejected todos (with gap notes) + ~48 never-built todos need building under the restored native,
metered, Gimli-gated model — NOT the old GitHub-per-task flow. Meter intake so we don't re-flood. Route
through lane heads: build → head-verify → QA (Gimli/Éowyn/Calibrator) → done. Prep-scope until Prompt E lands.

watch — meter the intake. An unmetered flood is what caused the 3→61 churn.

SECTION 08

Recommended sequence

A → B → C → D, then E → F once you open the repo-implementation gate. C (Frodo) gates D (retire); E gates F's product work. PKM-to-VDS and Docker are independent side-tracks. One lane, one chat.

Where to start

The factory already self-drives. The only real decision left before it runs continuously is whose hands hold the clock — the CEO's, or a keeper's. Answer that in one focused chat, and the rest is wiring.