AINA · Factory Forensics Paperclip task history · 263 tasks 2026-06-30

The Factory Worked — Until It Was Overrun

What the Paperclip record actually shows, from the first task on: a role-based factory that ran clean for ten days, a single-day flood of 181 tasks that broke the verification gate, and the GitHub-PR machinery that drifted in to cope — replacing the design instead of serving it.

Ali Mehdi Mukadam · co-authored with Claude · ~6 min read · grounded in all 263 Paperclip tasks

The Single Idea

The factory you designed worked exactly as intended for ten days: every task ran plan → build → Gimli verify → done, internally, and 65 of 67 tasks completed with zero stuck in review. Then on 06-29 a single batch of 181 tasks landed — more than the prior ten days combined — and per-task verification couldn't scale. To cope, work was rerouted through GitHub PRs + CI + a review bot, which quietly replaced Gimli's verify and Frodo's milestone-release. Hence today: 70 tasks orphaned in in_review, a per-task GitHub roundtrip you never wanted, and zero of M0–M8 complete — so Frodo has never once released. The GitHub churn isn't the design; it's scar tissue from the flood.

01 — The design

Proven on day one (06-19)

The very first tasks in the record are throughput smokes — your test runs — and they name the pipeline outright. On 06-19, two repos each ran the full chain:

foremanplan builderbuild reviewer · Gimliadversarial verify release · Frodoat milestone
The 06-19 smokes ran foreman plan → builder PR → reviewer → release captain end-to-end. That IS the design.

By 06-25 the same shape recurs with named reviewers — "Verify: Argus review of Jared's packet", "Review: architecture & repo-safety (Richard/CTO)". The intent was always build → independent adversarial review → gated release, kept small and internal.

02 — The clean era

It worked — cleanly — for ten days

The numbers for 06-19 through 06-28 are the healthiest thing in the whole record:

PeriodTasksDoneIn review (stuck)To-do
06-19 → 06-28 (clean era)676500
06-29 → 06-30 (post-flood)196727048

Created-and-completed same day, 1:1 — 13/12, 26/26, 17/17. Nothing piled up; nothing was "done-but-unverified." This is the factory doing what you built it to do, and it's the baseline every later decision should have protected.

03 — Gimli

Gimli was the verifier — confirmed

All nine of Gimli's tasks are verifications, not builds: "Factory builder verifier: frontend slice receipt check", "Adversarial review: Finch PKM mapping", "Adversarial loop/idempotency review", "Gimli adversarial review of department outputs."

The mechanism you remember The board's in_review status is the "awaiting Gimli" gate — the step between built and done. In the clean era it cleared same-day (0 stuck). Gimli adversarially verified every task before it was allowed to be done. It was internal, fast, and had nothing to do with GitHub.
04 — The flood

06-29: the day it broke

On 2026-06-29, 181 tasks were created in one day — the launch-readiness/council decomposition carrying the M0–M8 milestone labels. For scale: the entire prior ten days produced 54 tasks. The flood was more than 3× everything before it, in 24 hours.

0 181 06-1913 06-2526 06-268 06-273 06-2817 06-29181 06-3015
Tasks created per day. The clean era hovered at 3–26/day; 06-29 dropped 181 at once — the verification gate couldn't absorb it.

Of those 181, only 56 completed that day; 65 landed in in_review and never left — and 64 of the 70 orphans have no assignee at all. Built, then stranded: a per-task adversarial reviewer cannot hand-verify 181 things at once. The gate that cleared same-day for ten days jammed.

05 — The drift

GitHub PRs replaced the design

Faced with the flood, the system didn't scale Gimli — it rerouted around him. The COO dispatch loop began telling every lane to "commit + push + PR"; a PR bot-review watcher was added (06-30 02:01); then today I piled on Mergify, a merge-train, and release-marshal to keep that PR pipeline from jamming. All of it is machinery to run a per-task GitHub roundtrip.

That roundtrip silently replaced two designed roles:

Designed roleWhat it drifted into
Gimli — adversarial verify (internal, per task)CI + bot review on a GitHub PR
Frodo — milestone release (batch, at epic close)merge every task to main
So main went from "updated when a milestone is proven" to "updated per task" — and the internal, fast, contained model became a slow external one.

I spent today making that drift faster and self-healing instead of noticing it was the wrong model. That's the miss — and it's why you were right to stop and ask why the roundtrip exists at all.

06 — Now

Where it stands right now

SignalCountMeaning
done137landed work
in_review70built but never verified — flood orphans (65 from 06-29)
todo48flood work never started
cancelled / blocked6 / 2
milestones M0–M8 complete0not one epic finished → Frodo has never released

M0: 1/4 · M1: 0/4 · M2: 1/5 · M4: 1/7 · M6: 0/3 · M8: 0/5. The milestone gate — the thing Frodo exists to act on — has never once triggered, because no milestone has closed. Frodo has been idle not by neglect but because the work never reached the state that wakes him.

07 — The fix

What this means for the fix

The forensics point at one conclusion: restore the design, don't keep patching the drift.

  1. Re-instate Gimli as the per-task verifier, internally. in_review → (Gimli adversarial verify) → done. No GitHub CI/bot in the inner loop. This also drains the 70 orphans the designed way — verify them, don't merge them.
  2. Re-instate Frodo as the milestone release-captain. main moves only when an M-milestone closes and Frodo promotes it (with a Cloudflare preview for you along the way). Per-task PRs disappear.
  3. Retire the drift machinery. The per-task PR flow, the bot watcher, Mergify / merge-train / release-marshal exist only because the flood outran Gimli. Fix the verification model and they're unnecessary.
  4. Meter the intake. The root trigger was 181 tasks in a day against a same-day-verify model. Whatever the batch size, verification and release have to keep pace — or the queue orphans again.
Where to start

The factory isn't broken — it was overrun. Ten days of clean build → Gimli-verify → done proves the design; one 181-task day broke the gate; the GitHub-PR churn was scar tissue, not architecture. Put Gimli back on per-task verification and Frodo back on milestone release, drain the 70 orphans the designed way, and retire everything that grew up to manage a roundtrip you never wanted.