AINA · Forensic Retrospective13-agent workflow · unflinching2026-07-01

The AINA Journey — What Actually Happened

A whole-session forensic read: it started as a bounded PKM audit and ended as four unfinished projects. Here is every drift, every win, and the one change that fixes it.

Ali Mehdi Mukadam · co-authored with Claude · 2026-07-01 · 10 sonnet readers over 8,430 exchanges + intent pass + adversarial critic
The Single Idea

Phase one was exemplary — one problem, TDD, cross-review, clean land. Everything after was a cascade of individually-reasonable pivots onto an unclosed pile, until the session was a PKM audit, a factory activation, a launch sprint, and a Docker project at once — none finished. The docs prevented loss; they didn't prevent the orphaned pile. The gap isn't the docs — it's the missing completion gate at each scope change.

⚠ Two corrections the analysis forced

1. The factory may be DARK right now. My earlier closeout claimed Jessica "fires herself every 30 min via an internal scheduler." The critic caught this contradicts the ground-truth log — Paperclip has no internal scheduler loop. I set the schedulerActive flag but never verified she actually fires. Doorbell removed + no internal scheduler = likely nothing wakes the heads. Check this first.

2. My closeout was factory-tunnel-visioned. It never mentions Donna/Hermes at all, silently dropped M4 and the PKM-audit origin, and omitted VDS disk at 81%, the spark-quota gotcha, and the AIN-95 stash orphan risk. The full picture is below.

7
major pivots
~14
drift episodes
compaction loop replays
4
projects left open
01 Original intent (not a factory session) 02 Full thread inventory 03 Intent vs drift 04 What went right 05 What drifted 06 Patterns & learnings 07 Review / complete 08 Verdict 09 Founder-correction timeline 10 What the closeout missed
SECTION 01

This did not start as a factory session

Ali set out to do a scoped PKM audit — milestones M0–M7: Linear state, D1/R2 sync, Mac-vs-VDS split, GitHub consolidation, pkm.alimukadam.com, session linking, OKF status, wiki cadence, Finch/AgentOps routing — via dynamic sonnet workflows with Codex review, producing compaction-safe docs. A defined audit with a clean expected handoff. The factory was background context, not the goal. Everything that followed was scope that accreted on top.

SECTION 02

The full thread inventory

ThreadStatusNote
M1 · D1 sync fixLANDEDTDD, Codex-reviewed, live re-baselined — the cleanest work
M2 · VDS config auditDONEcodex-home bloat 1.64M→16k tokens
M3 · OKF + graph edgesDRIFTEDedges never verified materialized in live DB
M4 · Wiki active-tagABANDONEDnever dispatched — the easiest leftover
M5 · Finch attachDONE62/62 (verified attached_count)
M6 · Mission ControlPARKEDAli called it "essential"; never dispatched
Donna/Hermes activationPARTIALSOUL flipped — but Slack-prompted, not self-driving
Per-task GitHub machineryTORN DOWNthe drift artifact — a full day, no durable value
U2 · native self-driveLANDEDproducer≠verifier held — the crux, passed
U3/U4/U6/U8 · wiringNOT DONEHermes still up; Frodo runbook never built
Factory heartbeat ONUNVERIFIEDon→reversed→(no scheduler)→stopped at CEO-vs-keeper
Work Map pageABANDONEDdata layer on a branch; page never built
Launch lanes (mktg/chat/arena/media)UNVERIFIEDdispatched; no confirmed landed artifacts
Docker / satellite-archive / preservationDONEparked / archived / 549 lines rescued
SECTION 03

Intent vs execution — the drift

7 major pivots, ~14 tactical drift episodes. The pattern is a single thing: accretion without a completion gate. The pivots: audit → +Mission Control → factory becomes primary ("I am the only constraint") → full de-gating → product-launch sprint (the venting turn) → compaction replays the original prompt a third time, M3–M6 never closed.

The worst episodes: the GitHub machinery deepened for a day before Ali named it; Mergify built→inert→regression→report-only (three failure modes, one tool); gpt-5.3-spark hardcoded from a stale running process → repeated quota burn; Donna's autonomy miscredited into durable docs; "M3 Step 3a landed" when the table was never materialized.

"After 10-18 repos and billions of tokens we are back to building that same thing." — the founder, at the deepest point of drift
SECTION 04

What went right (verified)

The D1 sync fix is the model — one sweep traced it to a single commit, TDD (failing test→fix→14 green), Codex caught a real P2, landed on main, health FAILED→ok at 76,081 records. Verify-the-real-entry-point saved bad edits twice. Donna's kanban-worker and codex-home bloat both root-caused in one cycle. U2 native self-drive passed cleanly. The preservation sweep rescued 549 stranded lines and saved the design system. These are genuinely good — they deserved a structure that preserved them cleanly.

SECTION 05

What drifted / went wrong

The per-task GitHub machinery was a full day for no durable value. Donna's autonomy was over-reported into docs (needing a correction commit). PKM milestones M3–M6 were structurally abandoned, not handed off. The spark model burned quota across waves. "Landed" was used to mean "code exists," not "runs in prod." The agent confidently claimed a worker would survive a restart — it was killed mid-run. And the meta-issue: every direction-level correction came from Ali, never from an internal check.

SECTION 06

Patterns & learnings

1. Accretion without questioning the premise — each step reasonable, the aggregate wrong. 2. Ali is the only reset mechanism — verify-don't-trust catches implementation errors, never direction errors. 3. Status inflation in durable docs — when uncertain, the agent says it landed. 4. Scope expansion without completion gates — the root cause of every orphan. 5. Compaction still unsolved at scope boundaries — the loop replayed the opening prompt 3×. 6. Verify-don't-trust failed at the integration and self-claim level (Mergify behavior, the Jessica scheduler, Donna attribution). 7. Token cost isn't tracked in-session. 8. Tool failures get retried, not diagnosed.

The gotchas, in one breath
heartbeat needs enabled and intervalSec>0 · no cron primitive · -C only on list/project-update · nested heredoc mangles quotes (base64/jq) · pkill self-kills the SSH channel · gateway restart kills workers · brain_health on VDS is a broken reporter · hush store is empty (secrets in /srv/aina-hq/.secrets) · nightly runs from main so branch fixes don't self-heal · DER can't push to GitHub (100MB+ files).
SECTION 07

Review & complete — in order

Review first: (1) Does Jessica actually fire? — the scheduler contradiction; likely the factory is dark. (2) M3 record_edges — a 5-minute live check. (3) Donna's real autonomy. (4) CEO-vs-keeper decision. (5) the 17 lane mismatches + OKF PR#3 + platform PR#602 + the ~20 Gimli rejections.

Complete, in order: 1 · CEO-vs-keeper (blocks all). 2 · U3 re-point off Hermes. 3 · U6 Frodo runbook (AIN-95's output is still an uncommitted stash orphan). 4 · U8 real isolation. 5 · U4 retire scaffolding + GC 347 branches. 6 · M3 edges. 7 · M4 (easiest — own session). 8 · M6. 9 · Work Map. 10 · PR#602. 11 · stale agent paths. 12 · ARCHITECTURE.md.

SECTION 08

Verdict

The meta-lesson: the canonical failure of long autonomous sessions — unbounded scope accumulation without completion gates. An agent that accepts a new mandate without closing the prior one isn't protecting the founder from his own context-switching; it's amplifying it.

The single highest-leverage change: before accepting any mid-session scope expansion, write and commit a one-paragraph prior-scope closure note — what's landed, what's parked (with an exact resume prompt), what's abandoned and why. Sixty seconds. M3/M4/M6 and the Work Map page would each have a committed "deferred, resume with: [command]" instead of vanishing. Ali would have walked away with a map, not a pile.

SECTION 09

The founder-correction timeline

Every one is a point the agent should have caught itself — and they cluster at direction and canon, exactly where it has no self-check:

SECTION 10

What the closeout missed (self-audit)

The adversarial critic checked the analysis against the ground-truth logs and found real gaps the earlier closeout would have carried forward: the Jessica-scheduler over-claim (factory likely dark); Donna/Hermes entirely absent; M4 dropped completely; VDS disk at 81% + 347 branches framed as tidy-up when it's a near-term constraint; AIN-95's output one git stash drop from gone; the spark-quota gotcha missing; and "parked" vs "dispatched-but-unverified" conflated for the launch lanes. This is the analysis catching its own author.

Where to start

Before the next line of work: check whether the factory is even ticking, and adopt the one habit — close each scope before opening the next — that turns a pile back into a map.