Question
How should 361 orchestrate AI agents on top of the locked GitHub + Render + Neon stack and Neon + S3 data tier? Specifically: (a) what is Multica and is it worth using; (b) how does Claude Managed Agents fit vs the Claude Agent SDK; (c) what does a concrete end-to-end “day in the life of a PR” look like with agents in the loop; (d) what’s the cost envelope?
TL;DR
For Phase 1, run Claude Code locally on each dev’s laptop + anthropics/claude-code-action@v1 in GitHub Actions for PR review and @claude task delegation, backed by Claude Agent SDK for any webhook-driven agents you build. Pair with Neon branch-per-PR (already recommended in the DB analysis) and Render service previews (per Ibrahim’s DevOps note).
Skip Multica and Claude Managed Agents for Phase 1. Multica is real and impressive (multica-ai/multica, 16.7k stars, Apache-style OSS, self-hostable) but it’s a “kanban for agents” layer 361 doesn’t need at 3–5 devs. Managed Agents is Anthropic’s hosted harness — the $0.08/session-hour meter doesn’t pay off until you have genuine long-running (30 min+) agent workloads, and oxFlow’s current workloads are sub-10-minute PR reviews.
Cost envelope for the recommended stack: ~USD 200–500 / month for 3–5 devs doing ~100 PRs/month.
Approach
Delegated an identification + landscape survey to a research agent with web access. Explicitly asked it to find Multica the open-source project (not a marketing proxy name), then compare against Anthropic’s Managed Agents, the Claude Agent SDK, Cursor Background Agents, Devin, Replit Agent, and Factory.ai. Cross-referenced against official vendor docs and pricing pages.
Findings
Multica — what it actually is
Identification: confident match. GitHub search for multica surfaces multica-ai/multica as the clear winner — 16,737 stars, 2,074 forks, TypeScript, homepage multica.ai, description “The open-source managed agents platform. Turn coding agents into real teammates — assign tasks, track progress, compound skills.” No other “Multica” project with plausible overlap exists (second hit is sky-ecosystem/multicall, unrelated Solidity contract aggregator).
What it does. Project-management layer for AI coding agents. Humans and agents share an issue board; you assign tickets to an agent the same way you’d assign them to a teammate; the agent claims the task, executes it on a local or cloud runtime, and posts comments / progress via a real-time WebSocket feed. Solutions codify into skills any agent in the workspace can execute later — the pitch is “day 1 you teach an agent to deploy; day 30 every agent deploys, tests, and reviews.”
Architecture:
- Frontend: Next.js 16 (App Router).
- Backend: Go with Chi, sqlc, WebSocket.
- DB: Postgres 17 + pgvector (skill / memory retrieval).
- Runtime: local daemon that auto-detects installed agent CLIs (Claude Code, Codex, OpenClaw, OpenCode, Hermes, Gemini, Pi, Cursor Agent) and brokers tasks between the central backend and those CLIs. Important — Multica is orchestration glue, not an inference provider.
Workspace / team model. Multi-workspace with workspace-level isolation (own agents, issues, settings). Roles and permissions exist; specific invitation UX isn’t documented. Self-hosting is first-class: docker compose -f docker-compose.selfhost.yml up -d. See SELF_HOSTING.md.
Skills & memory. Skills are persisted reusable execution recipes, stored in Postgres and likely pgvector-indexed. Framed as “compounding skills” rather than generic LLM memory — closer to Letta’s sleep-time-compute model than a raw conversation log.
Maturity. 3 months old, 16k stars, 250 open issues, 43 releases — very active but young. License is Apache-2.0 with modifications (check LICENSE in-repo for exact terms).
Claude Managed Agents
Source: platform.claude.com — managed-agents/overview.
What it is. Anthropic-hosted agent harness. You don’t run the loop, container, or tool executor — Anthropic does. You POST to /v1/agents, /v1/environments, /v1/sessions. Beta header required: managed-agents-2026-04-01. GA not announced as of 2026-04.
Four core objects:
- Agent — model + system prompt + tools + MCP servers + skills. Reusable by ID.
- Environment — cloud container template: packages (Python / Node / Go), network rules, mounted files.
- Session — a running agent instance. Streams events over SSE. Server-persisted history.
- Event — user turn, tool result, status update.
Repo access. No native “connect GitHub org” button. BYO integration: mount a deploy key / PAT into the environment via mounted files, or have the agent git clone over HTTPS with a token in env vars. MCP servers (GitHub MCP, filesystem MCP) are the recommended path.
Tools. Bash, Read/Write/Edit/Glob/Grep, WebSearch, WebFetch, arbitrary MCP. Research-preview gated features: outcomes, multiagent, memory. Request via claude.com/form/claude-managed-agents.
Pricing. Standard token rates (Opus 4.7 ~$5/$25 per MTok in/out; Sonnet 4.6 ~$3/$15) plus $0.08/session-hour active runtime (idle doesn’t count). Web search: $10/1000 calls. 50% batch discount does NOT apply. See WaveSpeed and Verdent.
Rate limits. 60 create/min, 600 read/min per org.
Claude Agent SDK
Source: code.claude.com — agent-sdk/overview. Renamed from “Claude Code SDK.” Packages: @anthropic-ai/claude-agent-sdk (TS, bundles a native Claude Code binary as optional dep) and claude-agent-sdk (Python). Opus 4.7 support via v0.2.111+.
What you get (parity with Claude Code): 10+ built-in tools, hooks (PreToolUse, PostToolUse, Stop, SessionStart, SessionEnd, UserPromptSubmit), subagents via AgentDefinition, MCP, persistent resumable forkable sessions, allowedTools / permissionMode, filesystem config parity with CLAUDE.md + .claude/skills/*/SKILL.md + .claude/commands/*.md + .claude/settings.json. Auth via Anthropic API key, Bedrock (CLAUDE_CODE_USE_BEDROCK=1), Vertex, or Azure Foundry.
SDK vs Managed Agents:
| Criterion | Agent SDK | Managed Agents |
|---|---|---|
| Who hosts runtime | You | Anthropic |
| Loop control | Full | Abstracted |
| Per-hour runtime fee | No (just tokens + your compute) | Yes ($0.08/h) |
| Best for | CI runners, webhooks, laptops | Long-running async jobs (hours) |
| Data residency | Your choice (Bedrock Sydney available) | Anthropic’s regions |
| Auth isolation | You control secrets | API key + container |
For 3–5 dev NZ team running Claude as part of GitHub Actions (ephemeral, minutes-long), the SDK wins. You already have GitHub-hosted runners, you don’t want a $0.08/h meter ticking for 3-minute work, and the SDK is what anthropics/claude-code-action already wraps. Managed Agents pays off when genuine long-running (30+ min) workloads appear — not yet for oxFlow.
End-to-end orchestration pattern for oxFlow
A day in the life of a PR on the locked GitHub + Render + Neon stack, with agents slotted in:
- Local authoring (dev laptop). Dev runs Claude Code CLI with project
CLAUDE.md(oxFlow conventions: Postgres schemas, NZ construction vocabulary, Oxcon business rules). Skills in.claude/skills/. Iterate, commit to a feature branch. Session captured per the memory note’s SessionEnd hook. - Push → PR opens. GitHub Actions fires
pull_request.opened:- Neon Create-Branch Action (neondatabase/preview-branches) creates a copy-on-write branch of the staging DB; branch URL exposed as
NEON_DATABASE_URL. - Render service preview spins up a preview deploy wired to that Neon branch (render.com — preview environments).
- Test job runs migrations + unit + integration tests against the Neon branch.
- Agent review job runs
anthropics/claude-code-action@v1— reads the diff, posts review comments, flags issues. Built on the Agent SDK. Typical cost per PR: $5–25 on Sonnet 4.6. - Optionally: security review via
anthropics/claude-code-security-review.
- Neon Create-Branch Action (neondatabase/preview-branches) creates a copy-on-write branch of the staging DB; branch URL exposed as
- Team review. Humans open the Render preview against the masked Neon branch. Comment. Can
@claudea targeted fix — same action re-fires in write mode on the branch. - Merge to
main.pull_request.closedcleanup tears down the Neon branch and Render preview. Render auto-deploysmainto staging (against Neon’s staging branch with masked data). - Promote to production. Manual Render gate (per Ibrahim’s DevOps note). Neon production branch is untouched by PR work.
Where agents plug in:
- Inline with dev: Claude Code CLI locally.
- In CI:
claude-code-actionfor PR review +@claudedelegation — runs Agent SDK under the hood, scoped to the PR branch viaGITHUB_TOKEN. - Webhook-triggered: small Render Background Worker using Agent SDK + Express / FastAPI for Linear / Slack / Sentry webhooks (see next section).
- Multica: only if the team specifically wants a kanban board where agents are first-class members. Not required.
Webhook-triggered agents
Real patterns to copy:
- Sentry Seer → Linear. Sentry’s Seer posts root-cause analyses into Linear issues via webhook; Linear’s Agents API routes
@mentionevents back. HMAC-signed. Linear Agents Integration, Sentry docs. - Linear for Agents — first-party Agents API. Register a webhook URL, get mention events, reply via the Linear API.
- GitHub
@claudementions.claude-code-actionlistens onissue_comment,pull_request_review_comment,issues.assigned→ triggers agent runs on-demand. Marketplace listing. - GitHub Agent HQ mission control. Official pattern for routing tasks across Copilot coding agents, custom agents, and sub-agents.
For oxFlow a minimal webhook agent service on Render looks like:
- Render Web Service (Node / Python), Express / FastAPI.
- Endpoint verifies HMAC signature (Linear / Slack / GitHub).
- Spawns an Agent SDK session (
query({ prompt, options: { allowedTools, mcpServers } })). - Streams progress back to the source (Linear comment, Slack thread, GitHub PR comment).
Cost signals
Claude Code in GitHub Actions (token spend only):
- Small PR (<200 LOC): $8–12.
- Large PR (~2000 LOC): $30–40.
- Typical monthly: $5–25 for ~10–15 PRs/week per KissAPI guide and earezki.com’s setup.
- Anthropic’s managed “Code Review” product is deeper/more expensive at ~$15–25 per review — the open-source Action is cheaper and usually sufficient.
Claude Managed Agents. $0.08/session-hour + standard token rates. Example envelope: one agent running 4 hours of active session per weekday = 80 hours/month = $6.40 runtime + tokens (dominant). For 3 agents × 80h = ~$20 runtime + tokens.
oxFlow 3–5 dev monthly envelope:
- Local Claude Code (per-dev, Pro or API): 3 × $100 ~ $300 on Pro; ~$150–300 API-metered.
- CI PR review (100 PRs/month ×
$1.50 avg on Sonnet): **$150**. - Webhook agent service on Render ($7–25 compute) + token spend: ~$50.
- Total: ~USD 200–500/month. Well under a junior dev’s day-rate.
Comparison matrix
| Tool | OSS / self-host | Repo integration | Memory / skills | Pricing | Fit for 3–5 dev NZ team |
|---|---|---|---|---|---|
| Multica | Yes / Docker Compose | Via agent CLIs (Claude Code etc.) | pgvector-backed skills, workspace-scoped | Free (self-host) | Overkill; adds a PM layer most small teams don’t need |
| Claude Managed Agents | No / Anthropic-hosted | BYO via MCP or git clone in env | Session state + research-preview memory API | Tokens + $0.08/session-h | Useful for long async jobs; not your default |
| Claude Agent SDK + GitHub Actions | SDK free, you host runtime | Native via GITHUB_TOKEN, claude-code-action | CLAUDE.md + .claude/skills/ | Tokens only (~$150–400/mo) | Best fit. Matches the GitHub-centric stack |
| Cursor Background Agents | No / hosted | GitHub only (clone + push) | Chat history, workspace rules | Pro $20/mo + credit pool | Good for individual devs; less CI-centric |
| Devin (Cognition) | No / hosted | GitHub | Long-horizon plan memory | $20/mo core + $2.25/ACU | Expensive if interactive; shines on “delegate and forget” |
| Replit Agent | No / hosted | Replit-centric | Project memory | $25/mo Core | Weak fit — oxFlow doesn’t deploy to Replit |
| Factory.ai (Droids) | No / hosted | Linear / Jira / GitHub; token-based | Droid workflows | From $20/mo + tokens | Strong for Linear-driven teams; adds lock-in |
Recommendation for 361
Adopt this orchestration stack for Phase 1:
- Dev laptops — Claude Code CLI with project
CLAUDE.mdper repo. One paid Pro seat per active dev, or API-metered if bursty. - CI —
anthropics/claude-code-action@v1in GitHub Actions in two modes:- Automatic PR review on
pull_request.opened/synchronize— Sonnet 4.6 for speed/cost. - On-mention task mode on
issue_commentcontaining@claude— writes to the PR branch. Scoped to PR branches only. No production access.
- Automatic PR review on
- Data tier — Neon branch-per-PR via the Neon GitHub Action. Branch on PR open, drop on PR close. Staging branch is the source for preview branches so agents see realistic (masked) shapes. (See DB analysis.)
- App tier — Render service previews wired to each PR’s Neon branch via
renderPreviewsEnabled: true+ env-var overrides. Staging gate is Render’s “deploy to staging on merge to main, manual promote to prod.” (See DevOps note.) - Webhook agents — small Render Web Service on Claude Agent SDK (TypeScript). Endpoints for Linear, Slack (
@oxflow-bot), Sentry. HMAC-verified. MCP servers for GitHub, Neon, Postgres. - Skip Multica. Revisit in ~12 months if the agent count or task volume grows such that a dedicated board materially helps.
- Skip Managed Agents. Revisit when a use case like “overnight build-the-whole-feature” that genuinely runs for hours appears.
Governance guardrails before go-live:
CLAUDE.mdreviewed and version-controlled per repo (already — commit trail)..claude/settings.jsonpinsallowedTools,permissionMode: autoonly for read tools. Writes require PR.- GitHub Actions secrets for
ANTHROPIC_API_KEYscoped to environments. No prod DB access from PR workflows. - Neon branch-cleanup workflow tested — orphaned branches cost real money.
Cost envelope: USD 200–500/month for the full agent stack (licenses + tokens + Render compute), 3–5 devs, ~100 PRs/month. Well under one additional hire and directly accelerates the June 27 2026 Phase 1 target.
Alternatives considered
- Adopt Multica. Genuinely good project and we may want it later, but it adds a PM / kanban-for-agents abstraction 3–5 people don’t benefit from. Every ticket we run through Multica is a ticket we could have run in GitHub Issues +
@claude. Revisit if we grow past ~8 devs or start running many simultaneous long-horizon agent tasks. - Adopt Claude Managed Agents now. Pays off for genuinely long-running jobs (hours); our current workload is minutes. The $0.08/h meter plus no batch discount is material cost overhead. Revisit for overnight workflows.
- Cursor Background Agents. Individual-dev-friendly but less CI-centric than
claude-code-action. If a dev is already on Cursor, it’s fine; not worth switching the team onto Cursor for this. - Devin. Expensive for interactive work. The “delegate and forget” pitch is compelling but not where we are in Phase 1.
- Factory.ai (Droids). Interesting for Linear-heavy teams; we’re GitHub Issues-heavy. Adds lock-in.
Sources
multica-ai/multicamultica-aiorg- Multica self-hosting guide
- Medevel — Multica writeup
- Arun Baby — Multica as teammates
- Flowtivity — Multica overview
- Claude Managed Agents overview (docs)
- Claude Agent SDK overview (docs)
anthropics/claude-agent-sdk-typescriptanthropics/claude-agent-sdk-pythonanthropics/claude-code-action- Claude Code Action on GitHub Marketplace
anthropics/claude-code-security-review- Claude Managed Agents pricing — WaveSpeed
- Verdent — Managed Agents pricing guide
- Anthropic engineering — Scaling Managed Agents
- VentureBeat — Managed Agents launch
- Momentic — Managed Agents vs Agent SDK
- BSWEN — Managed Agents vs Agent SDK
- Neon — A database for every preview environment
- Neon CI preview workflows
- Render Preview Environments docs
- Render Service Previews docs
- KissAPI — Claude Code GitHub Actions setup 2026
- Dev|Journal — automating the dev workflow
- Linear Agents Integration
- Linear Sentry Agent
- Sentry Linear Agent docs
- GitHub Blog — Agent HQ mission control
- Cursor Background Agents docs
- Factory.ai pricing
- Factory.ai plans & models
- Cognition Devin vs Cursor 2026
- Claude Code Review blog
See also
- Shared project memory — the memory layer that feeds
CLAUDE.md+.claude/skills/context into the agents orchestrated here. - Agentic coding and PR review — the sibling note on BMAD, Superpowers, and which PR review agent sits on top of this orchestration.
- Database analysis — Neon + S3 data tier. The “Neon branch-per-PR” pattern above references this recommendation directly.
- DevOps and infrastructure — Ibrahim’s GitHub + Render + Neon baseline that this orchestration plugs into.
BRANDING.md— visual language used in the accompanying HTML.
Linked from
- Daily log 2026-04-17 — originating tasks (Claude Managed Agents + Multica, CI/CD workflows).