Yea let's do it for all ideally and not just codex but claude and whatever we need - it should be for the three active repos that we have to start with aina-platform, aina-data-engine-room and aina-academy
Got it — full adoption, both codex + claude CLIs in the image, across all three active repos (aina-platform, aina-data-engine-room, aina-academy). That's a real multi-repo build, so let me ground it: where the 3 repos live, their CI, the org, and how agent auth (subscription, no API keys) gets into containers.
cd /Users/Ali/PKM/scratch/pkm-deepdive
./vds-exec.sh run 'echo "=== the 3 repos on the VDS ==="
for r in aina-platform aina-data-engine-room aina-academy; do
for base in /srv/repos /srv/aina /home/ali; do
[ -d "$base/$r/.git" ] && { echo "$r -> $base/$r"; echo " remote: $(git -C "$base/$r" remote get-url origin 2>/dev/null)"; echo " workflows: $(ls "$base/$r"/.github/workflows/*.y*ml 2>/dev/null | xargs -n1 basename 2>/dev/null | tr "\n" " ")"; break; }
done
done
echo
echo "=== find any other clones (in case paths differ) ==="
find /srv /home/ali -maxdepth 3 -type d -name ".git" 2>/dev/null | grep -iE "aina-platform|data-engine" | head
echo
echo "=== agent auth to mount into containers (subscription, no API keys) ==="
ls -la /home/ali/.codex/auth.json /home/ali/.codex/config.toml 2>/dev/null | head
ls -la /home/ali/.claude/.credentials.json /home/ali/.claude.json 2>/dev/null | head
echo "claude CLI on VDS: $(command -v claude 2>/dev/null || echo MISSING)"
echo "codex CLI on VDS: $(command -v codex 2>/dev/null)"
echo
echo "=== existing self-hosted runners (org vs repo level) ==="
gh api repos/ainative-academy/aina-academy/actions/runners -q ".runners[]?|.name" 2>/dev/null
echo "org runners: $(gh api orgs/ainative-academy/actions/runners -q ".runners[]?|.name" 2>&1 | head -3)"' 2>&1 | tail -30=== the 3 repos on the VDS ===
aina-platform -> /srv/repos/aina-platform
remote: https://github.com/ainative-academy/aina-platform.git
workflows: ci.yml founder-review-pack.yml instruction-drift.yml pr-linear-key.yml production-promotion-gate.yml release-agent.yml second-opinion-required.yml
aina-data-engine-room -> /srv/aina/aina-data-engine-room
remote: https://github.com/ainative-academy/aina-data-engine-room.git
workflows:
aina-academy -> /srv/repos/aina-academy
remote: https://github.com/ainative-academy/aina-academy.git
workflows: canon-guard.yml ci.yml pr-bot-review-watcher.yml
=== find any other clones (in case paths differ) ===
/srv/repos/aina-platform/.git
/srv/aina/aina-data-engine-room/.git
/home/ali/agent-workspaces/aina-platform-pr590/.git
=== agent auth to mount into containers (subscription, no API keys) ===
-rw------- 1 ali ali 4507 Jun 26 00:00 /home/ali/.codex/auth.json
-rw------- 1 ali ali 1460 Jun 29 16:15 /home/ali/.codex/config.toml
-rw------- 1 ali ali 58669 Jun 30 10:25 /home/ali/.claude.json
-rw------- 1 ali ali 11249 Jun 30 09:47 /home/ali/.claude/.credentials.json
claude CLI on VDS: /usr/local/bin/claude
codex CLI on VDS: /usr/local/bin/codex
=== existing self-hosted runners (org vs repo level) ===
vds-runner
org runners: {"message":"You must be an org admin or have the runners and runner groups fine-grained permission.","documentation_url":"https://docs.github.com/rest/actions/self-hosted-runners#list-self-hosted-runners-for-an-organization","status":"403"}gh: You must be an org admin or have the runners and runner groups fine-grained permission. (HTTP 403)
gh: This API operation needs the "admin:org" scope. To request it, run: gh auth refresh -h github.com -s admin:orgYea whatever those repos have and whatever they might need including dependencies and or anything else
So the current setup but now in docker so I don't know if they need MCP and PKM and Linear and or any other such connections, think it through - look at how the VDS is set up and how doing this will change the workflow and what may or may not break and how it should be done ideally
This is the right instinct — containerizing without mapping every connection (MCP, PKM, Linear, auth, the Paperclip loopback API) would quietly break the agents. Let me study the VDS setup deeply before designing anything. First, what the agents actually connect to — codex/claude config, MCP servers, secrets.
cd /Users/Ali/PKM/scratch/pkm-deepdive
./vds-exec.sh run 'echo "===== codex config.toml (MCP servers, model, connections) ====="
cat /home/ali/.codex/config.toml 2>/dev/null
echo
echo "===== claude MCP servers (~/.claude.json mcpServers keys) ====="
python3 -c "import json;d=json.load(open(\"/home/ali/.claude.json\"));ms=d.get(\"mcpServers\",{});print(\"global mcpServers:\",list(ms.keys()));pr=d.get(\"projects\",{});[print(\" proj\",k,\"->\",list((v.get(\"mcpServers\") or {}).keys())) for k,v in list(pr.items())[:6]]" 2>/dev/null | head -15
echo
echo "===== secrets store ====="
ls -la /srv/aina-hq/.secrets/ 2>/dev/null | head -20' 2>&1 | tail -55[projects."/srv/repos/aina-paperclip-agent-context/workspaces/executive-governance"] trust_level = "trusted" [projects."/home/ali/Projects/pkm-agent"] trust_level = "trusted" [projects."/tmp"] trust_level = "trusted" [projects."/srv/repos/aina-academy"] trust_level = "trusted" [projects."/srv/aina/ops/finch-pkm-mapping"] trust_level = "trusted" [projects."/srv/repos/aina-platform"] trust_level = "trusted" [projects."/home/ali/aina-mission-control"] trust_level = "trusted" [projects."/srv/aina/ops"] trust_level = "trusted" ===== claude MCP servers (~/.claude.json mcpServers keys) ===== global mcpServers: ['pkm-agent', 'clerk', 'codegraph', 'linear', 'posthog', 'sentry', 'cloudflare-api', 'cloudflare-bindings', 'cloudflare-builds', 'cloudflare-docs', 'cloudflare-observability', 'playwright', 'github'] proj /srv/aina-hq -> [] proj /home/ali -> [] proj /home/ali/conductor/aina-hq -> [] proj /home/ali/conductor/repos/aina-cc -> [] proj /home/ali/conductor/repos/aina-evidence-atlas -> [] proj /home/ali/conductor/repos/aina-jobs-research -> [] ===== secrets store ===== total 76 drwx------ 3 ali ali 4096 Jun 29 21:17 . drwxr-xr-x 22 ali ali 4096 Jun 27 10:49 .. -rw------- 1 ali ali 6434 Jun 19 21:43 INDEX.md -rw------- 1 ali ali 360 Jun 16 01:55 clerk-test.env -rw------- 1 ali ali 585 Jun 3 07:56 contabo-root-breakglass.md -rw------- 1 ali ali 106 Jun 29 21:17 frill.env -rw------- 1 ali ali 295 Jun 29 20:03 gemini-interactions.env -rw------- 1 ali ali 403 Jun 29 21:17 niftypm.env -rw------- 1 ali ali 200 Jun 16 01:53 resend.env -rw------- 1 ali ali 288 Jun 16 01:39 sentry-api-token.env -rw------- 1 ali ali 704 Jun 19 21:43 sentry-dsn.env drwx------ 2 ali ali 4096 Jun 3 00:17 sources -rw------- 1 ali ali 68 Jun 21 10:44 staging-access.env -rw------- 1 ali ali 805 Jun 16 00:00 stripe-test.env -rw-r--r-- 1 ali ali 2461 Jun 16 00:00 stripe_setup.py -rw------- 1 ali ali 437 Jun 29 12:11 vertex-express.env -rw------- 1 ali ali 442 Jun 21 15:44 vertex-express.env.bak-2026-06-21 -rw------- 1 ali ali 437 Jun 29 20:03 vertex-express.env.bak-20260629T200329
Rich connection map emerging. The Claude agents use 13 MCP servers (pkm-agent, linear, clerk, posthog, sentry, 5× cloudflare, playwright, github, codegraph), there's a secrets store (clerk/stripe/sentry/vertex/resend/etc.), and codex's config is mostly trust-levels. Before I design the containerization, I need to know what each MCP is (local-stdio vs network) and how the lanes reach the Paperclip loopback API — those determine what breaks in a container.
cd /Users/Ali/PKM/scratch/pkm-deepdive
./vds-exec.sh run 'echo "===== MCP server definitions (stdio-local vs network) ====="
python3 -c "
import json
d=json.load(open(\"/home/ali/.claude.json\"))
for name,cfg in (d.get(\"mcpServers\") or {}).items():
t=cfg.get(\"type\") or (\"http\" if cfg.get(\"url\") else \"stdio\")
loc=cfg.get(\"url\") or (cfg.get(\"command\",\"\")+\" \"+\" \".join(cfg.get(\"args\",[])[:3]))
print(f\" {name:14s} [{t}] {loc[:80]}\")
" 2>/dev/null
echo
echo "===== Paperclip API reachability (how lanes call it) ====="
grep -nE "api.base|apiBase|baseUrl|127.0.0.1|localhost|loopback|port" /home/ali/.paperclip/config* /home/ali/.config/paperclip* 2>/dev/null | head
paperclipai whoami 2>&1 | head -3
echo "paperclip listening:"; ss -ltnp 2>/dev/null | grep -iE "paperclip|node" | head -5
echo
echo "===== do codex BUILD lanes use MCP? (codex-home configs) ====="
find /home/ali/.paperclip -name config.toml 2>/dev/null | head -2 | while read f; do echo "--- $f ---"; grep -cE "mcp_servers|\[mcp" "$f" 2>/dev/null | xargs echo "mcp entries:"; done
echo
echo "===== aina-platform CI workflows (what they invoke) ====="
grep -rhnE "runs-on:|LINEAR|secrets\.|uses: .*@|pnpm |npm " /srv/repos/aina-platform/.github/workflows/ci.yml 2>/dev/null | head -15' 2>&1 | tail -45===== MCP server definitions (stdio-local vs network) =====
pkm-agent [stdio] /home/ali/.local/bin/uv --directory /home/ali/Projects/pkm-agent run
clerk [http] https://mcp.clerk.com/mcp
codegraph [stdio] codegraph serve --mcp --no-watch
linear [http] https://mcp.linear.app/mcp
posthog [http] https://mcp.posthog.com/mcp
sentry [http] https://mcp.sentry.dev/mcp
cloudflare-api [http] https://mcp.cloudflare.com/mcp
cloudflare-bindings [http] https://bindings.mcp.cloudflare.com/mcp
cloudflare-builds [http] https://builds.mcp.cloudflare.com/mcp
cloudflare-docs [http] https://docs.mcp.cloudflare.com/mcp
cloudflare-observability [http] https://observability.mcp.cloudflare.com/mcp
playwright [stdio] npx @playwright/mcp@latest
github [stdio] bash -lc export GITHUB_PERSONAL_ACCESS_TOKEN="$(gh auth token)"; exec /home/ali/
===== Paperclip API reachability (how lanes call it) =====
{
"user": {
"id": "local-board",
paperclip listening:
LISTEN 0 511 127.0.0.1:3100 0.0.0.0:* users:(("node",pid=898536,fd=30))
===== do codex BUILD lanes use MCP? (codex-home configs) =====
--- /home/ali/.paperclip/instances/default/companies/7d58fc13-7c9a-4c87-b9f8-a7cfae8564a9/codex-home/config.toml ---
mcp entries: 0
===== aina-platform CI workflows (what they invoke) =====
13: runs-on: ubuntu-latest
15: - uses: actions/checkout@v6
20: - uses: pnpm/action-setup@v6
22: - uses: actions/setup-node@v6
52: run_check "pnpm install" "validation-advisory" pnpm install --frozen-lockfile
53: run_check "pnpm lint" "validation-advisory" pnpm lint
54: run_check "pnpm typecheck" "validation-advisory" pnpm typecheck
55: run_check "pnpm test" "validation-advisory" pnpm test
56: run_check "pnpm build" "validation-advisory" pnpm build
62: runs-on: ubuntu-latest
64: - uses: actions/checkout@v6
66: - uses: actions/setup-python@v6Now I have the full connection map. The key realization is there are three workload tiers with very different needs, which changes everything about how to do this:
http (linear, clerk, posthog, sentry, 5× cloudflare — just need network + tokens) but 4 are stdio-local (pkm-agent reads the local PKM store, codegraph reads the repo index, playwright needs chromium, github needs gh auth) — plus the secrets store.That distinction is the whole design. Let me write it up properly — the connection map, what breaks, and the ideal approach.
# VDS Containerization — Design & Impact Analysis *The current setup, in Docker — without breaking the web of connections (auth, 13 MCP servers, the PKM store, Linear, the Paperclip loopback API, secrets) that the agents actually run on.* **Ali Mehdi Mukadam · co-authored with Claude · 2026-06-30** --- ## The single idea Containerizing the factory is **not** "wrap the build in Docker." The agents on this VDS run on a **web of connections** — subscription auth for codex *and* claude, 13 MCP servers, the local PKM store, Linear, the Paperclip API on `127.0.0.1:3100`, and a secrets store. If you containerize naively, the build still works but the **agents go deaf** — no PKM recall, no Linear, no Paperclip status updates. The real design question is: **which connections does each kind of work need, and how does each cross the container boundary.** The good news: the three workloads need *very* different things, so this is done in tiers — easy/safe first, heavy/risky last. --- ## How the VDS works now — the connection map | Connection | What it is | How it's reached today | |---|---|---| | **codex auth** | subscription (no API key) | `~/.codex/auth.json` | | **claude auth** | subscription (no API key) | `~/.claude/.credentials.json` | | **PKM recall** | `pkm-agent` MCP (stdio) | `uv --directory ~/Projects/pkm-agent` → reads local PKM SQLite | | **codegraph** | code-intel MCP (stdio) | `codegraph serve --mcp` → reads the repo index | | **playwright** | browser MCP (stdio) | `npx @playwright/mcp` → needs chromium | | **github** | gh MCP (stdio) | `gh auth token` | | **Linear, Clerk, PostHog, Sentry, Cloudflare×5** | hosted MCP (http) | `https://mcp.*.com/mcp` — network + token | | **Paperclip** | the org's API | `paperclipai` → **`127.0.0.1:3100`** (loopback) | | **secrets** | Stripe/Clerk/Vertex/Resend/Sentry/Gemini… | `/srv/aina-hq/.secrets/*.env` | | **runner** | GitHub Actions self-hosted | one runner, **registered to aina-academy only** | **Three repos, three CI shapes:** aina-academy (3 workflows), aina-platform (7 — incl. promotion-gate, release-agent, Linear-key, second-opinion), aina-data-engine-room (**no workflows yet**). --- ## The three workload tiers (this is the whole design) **Tier 1 — CI jobs** (`pnpm install/lint/typecheck/test/build`, python). Need **only the toolchain + the repo**. No MCP, no Paperclip, no PKM. → trivial + zero connection risk. **Tier 2 — Codex build lanes.** codex-home has **0 MCP entries** — lanes just build. Need: **codex auth + repo + gh + the Paperclip loopback** (to flip issue status). No MCP. → medium. **Tier 3 — Claude / thinking agents.** The full stack: **all 13 MCP** (4 stdio-local + 8 http) + PKM store + secrets + Paperclip loopback. → heavy; most of the breakage risk lives here. --- ## What Docker changes — and what breaks if you don't handle it ``` HOST CONTAINER (needs the bridge) ~/.codex/auth.json, ~/.claude ─────► mount RW (token refresh writes here) /srv/aina-hq/.secrets ─────► mount RO (NEVER bake secrets into an image) PKM SQLite store ─────► mount RO (so pkm-agent MCP can read it) ~/Projects/pkm-agent, codegraph ─────► in image / mount (stdio MCP binaries) chromium (playwright) ─────► install in image 127.0.0.1:3100 (Paperclip) ─────► --network host (else lanes can't reach it) hosted MCP (linear, cloudflare) ─────► outbound HTTPS (bridge is fine) + pre-authed tokens gh auth ─────► mount ~/.config/gh or pass GH_TOKEN ``` **The five things that will break if not designed for:** 1. **stdio MCP go silent** — `pkm-agent`, `codegraph`, `playwright`, `github` spawn *local* processes reading *local* data. In a bare container they're absent → agents lose recall, code-intel, browser. Fix: put the binaries in the image + mount the data (PKM store, repo). 2. **Paperclip loopback unreachable** — containers don't see the host's `127.0.0.1:3100`. Fix: `--network host` for Tier-2/3 containers (CI Tier-1 doesn't need it). 3. **OAuth MCP can't re-auth headless** — the hosted MCPs (Linear, Clerk…) use interactive OAuth. A headless container **cannot** complete a login. Fix: pre-auth on the host, mount the token cache; treat re-auth as a host operation. *(This is a real, known limitation — not all MCP survive headless.)* 4. **Subscription token refresh** — the CLIs rewrite `auth.json`/credentials on refresh. Mount those **read-write** (or per-run copy), or auth silently goes stale. 5. **Secrets in images** — if secrets get baked into a layer, they leak into the image cache/registry. **Always mount, never bake.** Plus: the **runner is repo-scoped to aina-academy**. Three repos need either **3 repo-level runners** (works with the current gh permissions) or **1 org-level runner** (needs `admin:org`, which the VDS gh currently lacks — a one-time scope grant). --- ## The ideal design **1. Layered images (one base, thin per-repo).** - `factory-base`: Ubuntu 24.04 + node 22 + pnpm 10.11.0 + python + git + gh + **codex + claude CLIs** + uv + codegraph + playwright/chromium + the stdio-MCP runtimes. - `repo-<name>`: `FROM factory-base` + that repo's deps (pinned). One per active repo. **2. Secrets & auth by mount, never baked.** A standard run-profile mounts: `~/.codex` + `~/.claude` (RW, for refresh), `~/.config/gh` (RO), `/srv/aina-hq/.secrets` (RO), the PKM store (RO). The image stays secret-free and publishable. **3. Networking by tier.** Tier-1 CI → default bridge (isolated, just needs egress). Tier-2/3 lanes & agents → `--network host` so the Paperclip loopback + all local services "just work" (simplest correct option; revisit host-gateway later if we want stricter isolation). **4. Runners.** Start with **per-repo self-hosted runners** (3) running jobs in Tier-1 containers — no new permissions needed. Consolidate to an org runner later if you grant `admin:org`. **5. Phased rollout — risk increases per tier, so go in that order:** - **Phase 1 — CI in containers, all 3 repos.** Zero connection risk; biggest bloat + reproducibility win. (aina-data-engine-room has no workflows — add a minimal one or skip.) - **Phase 2 — Codex lanes in containers.** Auth mount + `--network host` for Paperclip; no MCP, so contained. - **Phase 3 — Claude/MCP agents in containers.** The full mount+network+MCP profile; validate each stdio MCP and the OAuth-token mounts before cutting over. **6. Governance.** `docker system prune` cron for image/build-cache; keep the existing janitor + 85% disk alert; per-container CPU/mem limits. --- ## Risks & mitigations (named) - **Headless OAuth MCP** → pre-auth on host + mount token cache; document re-auth as a host step. Biggest unknown — validate in Phase 3 with one agent before fleet cutover. - **Token refresh staleness** → mount auth RW; smoke-test a long-running container. - **Paperclip coupling** → `--network host` (verified: it listens on loopback :3100); if we ever move Paperclip, the lanes follow via one env var. - **3-repo runner scope** → per-repo runners now; org runner when `admin:org` is granted. - **aina-data-engine-room has no CI** → decide: add a minimal CI or leave it lane-only. - **Docker's own disk** → prune cron (bounded, unlike worktrees). --- ## Recommendation & sequencing Do it **in tiers, lowest-risk first**: Phase 1 (CI, all 3 repos) is safe and high-value and can ship now. Phase 2 (codex lanes) needs only auth + the Paperclip loopback. Phase 3 (Claude/MCP agents) is where the real care goes — prove the stdio-MCP + OAuth-token mounts on **one** agent before moving the fleet. Nothing is big-bang; each phase is independently reversible (the native path still exists until you cut over). The immediate fire is already out (free CI + janitor + disk alert). This is the **durable, multi-repo, multi-agent** architecture — build it deliberately, not tonight in one shot. --- **Ali Mehdi Mukadam · co-authored with Claude · 2026-06-30** ```yaml topics: [vds-infrastructure, docker, agent-factory, mcp, multi-repo] subtopics: [connection-map, stdio-mcp, paperclip-loopback, subscription-auth, secrets-mounting, oauth-headless, self-hosted-runners, phased-rollout] ```
File created successfully at: /Users/Ali/PKM/scratch/pkm-deepdive/vds-containerization-design-2026-06-30.md (file state is current in your context — no need to Read it back)
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>VDS Containerization — Design & Impact Analysis</title>
<style>
:root{
--bg:#f6f3ec;--ink:#1a1814;--ink-soft:#4a463e;--ink-faint:#8a8478;
--rule:#d9d3c4;--rule-soft:#e8e3d4;--paper:#fbf9f2;
--accent:#1d3557;--accent-soft:#3d5a80;--accent-bg:#e4e9f0;--warn:#a8321e;
--serif:'Iowan Old Style','Palatino Linotype',Palatino,Georgia,serif;
--sans:ui-sans-serif,system-ui,-apple-system,sans-serif;
--mono:'SF Mono','JetBrains Mono',Menlo,Consolas,monospace;
}
@media (prefers-color-scheme:dark){:root{
--bg:#101218;--ink:#e7e9ef;--ink-soft:#b4b8c4;--ink-faint:#6f7585;
--rule:#262a35;--rule-soft:#1b1e27;--paper:#161922;
--accent:#7aa0c8;--accent-soft:#5d82ad;--accent-bg:#1c2533;--warn:#d4664f;}}
*{box-sizing:border-box}
body{margin:0;background:var(--bg);color:var(--ink);font-family:var(--serif);font-size:19px;line-height:1.62;padding:48px 22px 80px}
.wrap{max-width:820px;margin:0 auto}
::selection{background:var(--accent-bg)}
.meta{font-family:var(--sans);font-size:12px;letter-spacing:.13em;text-transform:uppercase;color:var(--ink-faint);display:flex;flex-wrap:wrap;gap:14px;align-items:center;border-bottom:1px solid var(--rule);padding-bottom:14px;margin-bottom:26px}
.meta .dot{width:6px;height:6px;border-radius:50%;background:var(--accent);display:inline-block}.meta .sp{flex:1}
h1{font-size:clamp(31px,5.5vw,47px);line-height:1.08;margin:.1em 0 .25em;font-weight:600;letter-spacing:-.012em}
.deck{font-style:italic;font-size:clamp(18px,3vw,22px);color:var(--ink-soft);line-height:1.42;margin:0 0 22px;max-width:48ch}
.byline{font-family:var(--sans);font-size:13px;color:var(--ink-faint);border-top:1px solid var(--rule);border-bottom:1px solid var(--rule);padding:11px 0;margin-bottom:36px;display:flex;flex-wrap:wrap;gap:8px 18px}
.byline b{color:var(--ink-soft);font-weight:600}
.thesis{position:relative;border:1.5px solid var(--accent);background:var(--paper);padding:30px 26px 24px;margin:40px 0;border-radius:3px}
.thesis .kick{position:absolute;top:-11px;left:18px;background:var(--bg);padding:0 10px;font-family:var(--sans);font-size:11px;letter-spacing:.16em;text-transform:uppercase;color:var(--accent);font-weight:700}
.thesis p{margin:0;font-size:19px;line-height:1.55}
section{margin:46px 0}
.num{font-family:var(--mono);font-size:12.5px;letter-spacing:.1em;color:var(--accent);font-weight:600;margin-bottom:6px;display:block}
h2{font-size:clamp(22px,4vw,28px);line-height:1.16;margin:.1em 0 .5em;font-weight:600;letter-spacing:-.01em}
p{margin:0 0 16px}strong{font-weight:600}
code{font-family:var(--mono);font-size:.84em;background:var(--accent-bg);padding:1px 5px;border-radius:3px}
table{width:100%;border-collapse:collapse;margin:22px 0;font-size:14.5px}
thead th{font-family:var(--sans);font-size:11px;letter-spacing:.07em;text-transform:uppercase;color:var(--ink-faint);text-align:left;padding:0 10px 9px;border-bottom:1.5px solid var(--rule);vertical-align:bottom}
tbody td{padding:10px 10px;border-bottom:1px solid var(--rule-soft);vertical-align:top;line-height:1.4}
tbody td:first-child{font-weight:600;white-space:nowrap}
.tiers{display:grid;grid-template-columns:repeat(3,1fr);gap:13px;margin:24px 0}
.tier{background:var(--paper);border:1px solid var(--rule);border-radius:5px;padding:16px 15px;border-top:3px solid var(--accent-soft)}
.tier.t1{border-top-color:#2d5a3d}.tier.t2{border-top-color:#c47a1f}.tier.t3{border-top-color:var(--warn)}
.tier h4{font-family:var(--sans);font-size:14px;margin:0 0 4px}
.tier .risk{font-family:var(--sans);font-size:11px;letter-spacing:.08em;text-transform:uppercase;color:var(--ink-faint);margin-bottom:8px}
.tier p{font-size:13px;line-height:1.42;color:var(--ink-soft);margin:0}
pre.bridge{background:var(--paper);border:1px solid var(--rule);border-left:3px solid var(--accent);border-radius:4px;padding:14px 16px;font-family:var(--mono);font-size:12px;line-height:1.6;color:var(--ink-soft);overflow-x:auto;white-space:pre}
ol.breaks{counter-reset:b;list-style:none;margin:20px 0;padding:0}
ol.breaks li{counter-increment:b;position:relative;padding:0 0 14px 40px;line-height:1.5}
ol.breaks li::before{content:counter(b);position:absolute;left:0;top:0;font-family:var(--mono);font-size:12px;color:var(--warn);font-weight:700;border:1.5px solid var(--warn);border-radius:50%;width:26px;height:26px;display:flex;align-items:center;justify-content:center}
ol.breaks b{font-weight:600}
ul.clean{margin:14px 0;padding-left:0;list-style:none}
ul.clean li{position:relative;padding:0 0 11px 22px;line-height:1.48}
ul.clean li::before{content:"";position:absolute;left:2px;top:10px;width:7px;height:7px;background:var(--accent-soft);border-radius:50%}
.flow{display:grid;grid-template-columns:repeat(3,1fr);gap:13px;margin:24px 0}
.step{background:var(--paper);border:1px solid var(--rule);border-radius:4px;padding:15px 14px}
.step .sn{font-family:var(--mono);font-size:12px;color:var(--accent);font-weight:700}
.step h4{font-family:var(--sans);font-size:13.5px;margin:7px 0 6px}.step p{font-size:13px;line-height:1.4;color:var(--ink-soft);margin:0}
.pull{border-left:3px solid var(--warn);padding:6px 0 6px 22px;margin:30px 0;font-size:21px;line-height:1.34;font-style:italic;max-width:38ch}
.closer{border-top:3px double var(--accent);border-bottom:3px double var(--accent);padding:24px 0;margin:44px 0 32px;text-align:center}
.closer .lab{font-family:var(--sans);font-size:11px;letter-spacing:.18em;text-transform:uppercase;color:var(--accent);margin-bottom:10px}
.closer p{font-style:italic;font-size:19px;line-height:1.42;margin:0;max-width:54ch;margin-inline:auto}
footer{font-family:var(--sans);font-size:13px;color:var(--ink-faint);border-top:1px solid var(--rule);padding-top:22px;margin-top:38px}
footer .bl{color:var(--ink-soft);margin-bottom:16px}
.yaml{font-family:var(--mono);font-size:12px;background:var(--paper);border:1px solid var(--rule);border-radius:3px;padding:13px 15px;color:var(--ink-soft);white-space:pre-wrap;line-height:1.6;margin-bottom:16px}
.pills{display:flex;flex-wrap:wrap;gap:8px}.pill{font-size:11.5px;background:var(--accent-bg);color:var(--accent);border-radius:20px;padding:4px 11px;font-weight:600}
@media (max-width:680px){body{font-size:18px;padding:34px 16px 60px}.tiers,.flow{grid-template-columns:1fr}}
</style>
</head>
<body>
<div class="wrap">
<div class="meta"><span>AINA Infra · Design</span><span class="dot"></span><span>VDS · Docker · MCP</span><span class="sp"></span><span>2026-06-30</span></div>
<h1>VDS Containerization — Design & Impact</h1>
<p class="deck">The current setup, in Docker — without the agents going deaf. Mapping every connection (auth, 13 MCP, PKM, Linear, the Paperclip loopback, secrets) and what crosses the container boundary.</p>
<div class="byline"><span><b>Ali Mehdi Mukadam</b> · co-authored with Claude</span><span>· 6 min read</span><span>· design, not a build order</span></div>
<div class="thesis">
<span class="kick">The Single Idea</span>
<p>This isn't "wrap the build in Docker." The agents run on a <strong>web of connections</strong> — codex <em>and</em> claude auth, 13 MCP servers, the local PKM store, Linear, the Paperclip API on <code>127.0.0.1:3100</code>, a secrets store. Containerize naively and the build still passes but the <strong>agents go deaf</strong> — no recall, no Linear, no Paperclip. The design is: <strong>which connections does each kind of work need, and how does each cross the boundary.</strong></p>
</div>
<section>
<span class="num">01</span>
<h2>How the VDS works now — the connection map</h2>
<table>
<thead><tr><th>Connection</th><th>What it is</th><th>Reached today via</th></tr></thead>
<tbody>
<tr><td>codex / claude auth</td><td>subscription (no API key)</td><td><code>~/.codex/auth.json</code>, <code>~/.claude/.credentials.json</code></td></tr>
<tr><td>pkm-agent</td><td>PKM recall MCP (stdio)</td><td><code>uv --directory ~/Projects/pkm-agent</code> → local SQLite</td></tr>
<tr><td>codegraph / playwright / github</td><td>stdio MCP (local)</td><td>code index · chromium · <code>gh auth token</code></td></tr>
<tr><td>Linear, Clerk, PostHog, Sentry, CF×5</td><td>hosted MCP (http)</td><td><code>https://mcp.*.com/mcp</code> — net + token</td></tr>
<tr><td>Paperclip</td><td>the org's API</td><td><code>127.0.0.1:3100</code> (loopback)</td></tr>
<tr><td>secrets</td><td>Stripe/Clerk/Vertex/Resend…</td><td><code>/srv/aina-hq/.secrets/*.env</code></td></tr>
<tr><td>runner</td><td>self-hosted Actions</td><td>one runner, <strong>aina-academy only</strong></td></tr>
</tbody>
</table>
</section>
<section>
<span class="num">02</span>
<h2>Three workload tiers — this is the whole design</h2>
<div class="tiers">
<div class="tier t1"><div class="risk">Tier 1 · low risk</div><h4>CI jobs</h4><p>pnpm lint/test/build, python. Need <strong>only toolchain + repo</strong>. No MCP, no Paperclip, no PKM. Trivial.</p></div>
<div class="tier t2"><div class="risk">Tier 2 · medium</div><h4>Codex lanes</h4><p>0 MCP entries — just build. Need <strong>auth + repo + gh + Paperclip loopback</strong>. No MCP.</p></div>
<div class="tier t3"><div class="risk">Tier 3 · heavy</div><h4>Claude / MCP agents</h4><p>All <strong>13 MCP</strong> (4 stdio-local + 8 http) + PKM store + secrets + loopback. Most breakage risk.</p></div>
</div>
<p><strong>Three repos, three CI shapes:</strong> aina-academy (3 workflows), aina-platform (7 — promotion-gate, release-agent, Linear-key…), aina-data-engine-room (<strong>none yet</strong>).</p>
</section>
<section>
<span class="num">03</span>
<h2>What Docker changes — the boundary bridge</h2>
<pre class="bridge">HOST CONTAINER (the bridge)
~/.codex, ~/.claude ─────► mount RW (token refresh writes here)
/srv/aina-hq/.secrets ─────► mount RO (NEVER bake secrets in an image)
PKM SQLite store ─────► mount RO (so pkm-agent MCP can read it)
pkm-agent / codegraph ─────► in image (stdio MCP binaries)
chromium (playwright) ─────► in image
127.0.0.1:3100 (Paperclip) ─────► --network host (else lanes can't reach it)
hosted MCP (linear, CF…) ─────► egress + pre-authed tokens
gh auth ─────► mount ~/.config/gh or GH_TOKEN</pre>
<h2 style="margin-top:32px">What breaks if you don't design for it</h2>
<ol class="breaks">
<li><b>stdio MCP go silent</b> — pkm-agent, codegraph, playwright, github spawn <em>local</em> processes on <em>local</em> data. Bare container → agents lose recall, code-intel, browser. Fix: binaries in the image + mount the data.</li>
<li><b>Paperclip loopback unreachable</b> — containers can't see host <code>127.0.0.1:3100</code>. Fix: <code>--network host</code> for Tier 2/3.</li>
<li><b>OAuth MCP can't re-auth headless</b> — Linear/Clerk/etc. use interactive login a container can't complete. Fix: pre-auth on host, mount the token cache; re-auth is a host step.</li>
<li><b>Subscription token refresh</b> — CLIs rewrite auth on refresh. Mount RW or auth goes stale.</li>
<li><b>Secrets baked into a layer</b> leak into the image cache. Always mount, never bake.</li>
</ol>
<p>Plus: the runner is <strong>repo-scoped to aina-academy</strong>. Three repos → 3 repo-level runners (works now) or 1 org runner (needs <code>admin:org</code>, which the VDS gh currently lacks).</p>
</section>
<section>
<span class="num">04</span>
<h2>The ideal design</h2>
<ul class="clean">
<li><strong>Layered images</strong> — <code>factory-base</code> (Ubuntu 24.04 + node22 + pnpm + python + git + gh + <strong>codex & claude CLIs</strong> + uv + codegraph + playwright/chromium) → thin <code>repo-<name></code> per active repo.</li>
<li><strong>Auth & secrets by mount, never baked</strong> — one run-profile mounts <code>~/.codex</code>, <code>~/.claude</code> (RW), <code>~/.config/gh</code> (RO), <code>/srv/aina-hq/.secrets</code> (RO), the PKM store (RO). Image stays secret-free.</li>
<li><strong>Networking by tier</strong> — CI on the bridge (just egress); lanes/agents on <code>--network host</code> so the Paperclip loopback + locals just work.</li>
<li><strong>Runners</strong> — per-repo self-hosted runners now (no new perms); consolidate to an org runner once <code>admin:org</code> is granted.</li>
<li><strong>Governance</strong> — <code>docker system prune</code> cron + the existing janitor + 85% disk alert + per-container CPU/mem limits.</li>
</ul>
<div class="flow">
<div class="step"><div class="sn">Phase 1</div><h4>CI in containers ×3 repos</h4><p>Zero connection risk, biggest bloat/repro win. Ship now.</p></div>
<div class="step"><div class="sn">Phase 2</div><h4>Codex lanes</h4><p>Auth mount + <code>--network host</code> for Paperclip. No MCP — contained.</p></div>
<div class="step"><div class="sn">Phase 3</div><h4>Claude / MCP agents</h4><p>Full mount+net+MCP. Prove stdio-MCP + OAuth tokens on ONE agent first.</p></div>
</div>
<p class="pull">The build passing isn't the test — the agents still hearing PKM, Linear, and Paperclip is.</p>
</section>
<section>
<span class="num">05</span>
<h2>Where to start</h2>
<div class="closer">
<div class="lab">Recommendation</div>
<p>Go in tiers, lowest-risk first. Phase 1 (CI, all 3 repos) is safe and high-value — ship it. Phase 2 needs only auth + the Paperclip loopback. Phase 3 is where the care goes — validate the stdio-MCP and OAuth-token mounts on one agent before the fleet. Each phase is reversible; the native path survives until you cut over. The fire's already out — this is the deliberate, multi-repo, multi-agent build.</p>
</div>
</section>
<footer>
<div class="bl"><b>Ali Mehdi Mukadam</b> · co-authored with Claude · 2026-06-30</div>
<div class="yaml">topics: [vds-infrastructure, docker, agent-factory, mcp, multi-repo]
subtopics: [connection-map, stdio-mcp, paperclip-loopback, subscription-auth, secrets-mounting, oauth-headless, self-hosted-runners, phased-rollout]</div>
<div class="pills"><span class="pill">#docker</span><span class="pill">#mcp</span><span class="pill">#connection-map</span><span class="pill">#paperclip-loopback</span><span class="pill">#secrets-mounting</span><span class="pill">#oauth-headless</span><span class="pill">#phased-rollout</span></div>
</footer>
</div>
</body>
</html>
File created successfully at: /Users/Ali/PKM/scratch/pkm-deepdive/vds-containerization-design-2026-06-30.html (file state is current in your context — no need to Read it back)
{
"files": [
"/Users/Ali/PKM/scratch/pkm-deepdive/vds-containerization-design-2026-06-30.html"
],
"caption": "VDS containerization design + impact analysis — the full connection map, what breaks, and the ideal phased approach. Paired .md alongside.",
"status": "normal"
}