Research — Multica, Claude Managed Agents, and orchestration for oxFlow

Question

How should 361 orchestrate AI agents on top of the locked GitHub + Render + Neon stack and Neon + S3 data tier? Specifically: (a) what is Multica and is it worth using; (b) how does Claude Managed Agents fit vs the Claude Agent SDK; (c) what does a concrete end-to-end “day in the life of a PR” look like with agents in the loop; (d) what’s the cost envelope?

TL;DR

For Phase 1, run Claude Code locally on each dev’s laptop + anthropics/claude-code-action@v1 in GitHub Actions for PR review and @claude task delegation, backed by Claude Agent SDK for any webhook-driven agents you build. Pair with Neon branch-per-PR (already recommended in the DB analysis) and Render service previews (per Ibrahim’s DevOps note).

Skip Multica and Claude Managed Agents for Phase 1. Multica is real and impressive (multica-ai/multica, 16.7k stars, Apache-style OSS, self-hostable) but it’s a “kanban for agents” layer 361 doesn’t need at 3–5 devs. Managed Agents is Anthropic’s hosted harness — the $0.08/session-hour meter doesn’t pay off until you have genuine long-running (30 min+) agent workloads, and oxFlow’s current workloads are sub-10-minute PR reviews.

Cost envelope for the recommended stack: ~USD 200–500 / month for 3–5 devs doing ~100 PRs/month.

Approach

Delegated an identification + landscape survey to a research agent with web access. Explicitly asked it to find Multica the open-source project (not a marketing proxy name), then compare against Anthropic’s Managed Agents, the Claude Agent SDK, Cursor Background Agents, Devin, Replit Agent, and Factory.ai. Cross-referenced against official vendor docs and pricing pages.

Findings

Multica — what it actually is

Identification: confident match. GitHub search for multica surfaces multica-ai/multica as the clear winner — 16,737 stars, 2,074 forks, TypeScript, homepage multica.ai, description “The open-source managed agents platform. Turn coding agents into real teammates — assign tasks, track progress, compound skills.” No other “Multica” project with plausible overlap exists (second hit is sky-ecosystem/multicall, unrelated Solidity contract aggregator).

What it does. Project-management layer for AI coding agents. Humans and agents share an issue board; you assign tickets to an agent the same way you’d assign them to a teammate; the agent claims the task, executes it on a local or cloud runtime, and posts comments / progress via a real-time WebSocket feed. Solutions codify into skills any agent in the workspace can execute later — the pitch is “day 1 you teach an agent to deploy; day 30 every agent deploys, tests, and reviews.”

Architecture:

Frontend: Next.js 16 (App Router).
Backend: Go with Chi, sqlc, WebSocket.
DB: Postgres 17 + pgvector (skill / memory retrieval).
Runtime: local daemon that auto-detects installed agent CLIs (Claude Code, Codex, OpenClaw, OpenCode, Hermes, Gemini, Pi, Cursor Agent) and brokers tasks between the central backend and those CLIs. Important — Multica is orchestration glue, not an inference provider.

Workspace / team model. Multi-workspace with workspace-level isolation (own agents, issues, settings). Roles and permissions exist; specific invitation UX isn’t documented. Self-hosting is first-class: docker compose -f docker-compose.selfhost.yml up -d. See SELF_HOSTING.md.

Skills & memory. Skills are persisted reusable execution recipes, stored in Postgres and likely pgvector-indexed. Framed as “compounding skills” rather than generic LLM memory — closer to Letta’s sleep-time-compute model than a raw conversation log.

Maturity. 3 months old, 16k stars, 250 open issues, 43 releases — very active but young. License is Apache-2.0 with modifications (check LICENSE in-repo for exact terms).

Claude Managed Agents

Source: platform.claude.com — managed-agents/overview.

What it is. Anthropic-hosted agent harness. You don’t run the loop, container, or tool executor — Anthropic does. You POST to /v1/agents, /v1/environments, /v1/sessions. Beta header required: managed-agents-2026-04-01. GA not announced as of 2026-04.

Four core objects:

Agent — model + system prompt + tools + MCP servers + skills. Reusable by ID.
Environment — cloud container template: packages (Python / Node / Go), network rules, mounted files.
Session — a running agent instance. Streams events over SSE. Server-persisted history.
Event — user turn, tool result, status update.

Repo access. No native “connect GitHub org” button. BYO integration: mount a deploy key / PAT into the environment via mounted files, or have the agent git clone over HTTPS with a token in env vars. MCP servers (GitHub MCP, filesystem MCP) are the recommended path.

Tools. Bash, Read/Write/Edit/Glob/Grep, WebSearch, WebFetch, arbitrary MCP. Research-preview gated features: outcomes, multiagent, memory. Request via claude.com/form/claude-managed-agents.

Pricing. Standard token rates (Opus 4.7 ~$5/$25 per MTok in/out; Sonnet 4.6 ~$3/$15) plus $0.08/session-hour active runtime (idle doesn’t count). Web search: $10/1000 calls. 50% batch discount does NOT apply. See WaveSpeed and Verdent.

Rate limits. 60 create/min, 600 read/min per org.

Claude Agent SDK

Source: code.claude.com — agent-sdk/overview. Renamed from “Claude Code SDK.” Packages: @anthropic-ai/claude-agent-sdk (TS, bundles a native Claude Code binary as optional dep) and claude-agent-sdk (Python). Opus 4.7 support via v0.2.111+.

What you get (parity with Claude Code): 10+ built-in tools, hooks (PreToolUse, PostToolUse, Stop, SessionStart, SessionEnd, UserPromptSubmit), subagents via AgentDefinition, MCP, persistent resumable forkable sessions, allowedTools / permissionMode, filesystem config parity with CLAUDE.md + .claude/skills/*/SKILL.md + .claude/commands/*.md + .claude/settings.json. Auth via Anthropic API key, Bedrock (CLAUDE_CODE_USE_BEDROCK=1), Vertex, or Azure Foundry.

SDK vs Managed Agents:

Criterion	Agent SDK	Managed Agents
Who hosts runtime	You	Anthropic
Loop control	Full	Abstracted
Per-hour runtime fee	No (just tokens + your compute)	Yes ($0.08/h)
Best for	CI runners, webhooks, laptops	Long-running async jobs (hours)
Data residency	Your choice (Bedrock Sydney available)	Anthropic’s regions
Auth isolation	You control secrets	API key + container

For 3–5 dev NZ team running Claude as part of GitHub Actions (ephemeral, minutes-long), the SDK wins. You already have GitHub-hosted runners, you don’t want a $0.08/h meter ticking for 3-minute work, and the SDK is what anthropics/claude-code-action already wraps. Managed Agents pays off when genuine long-running (30+ min) workloads appear — not yet for oxFlow.

End-to-end orchestration pattern for oxFlow

A day in the life of a PR on the locked GitHub + Render + Neon stack, with agents slotted in:

Local authoring (dev laptop). Dev runs Claude Code CLI with project CLAUDE.md (oxFlow conventions: Postgres schemas, NZ construction vocabulary, Oxcon business rules). Skills in .claude/skills/. Iterate, commit to a feature branch. Session captured per the memory note’s SessionEnd hook.
Push → PR opens. GitHub Actions fires pull_request.opened:
- Neon Create-Branch Action (neondatabase/preview-branches) creates a copy-on-write branch of the staging DB; branch URL exposed as NEON_DATABASE_URL.
- Render service preview spins up a preview deploy wired to that Neon branch (render.com — preview environments).
- Test job runs migrations + unit + integration tests against the Neon branch.
- Agent review job runs anthropics/claude-code-action@v1 — reads the diff, posts review comments, flags issues. Built on the Agent SDK. Typical cost per PR: $5–25 on Sonnet 4.6.
- Optionally: security review via anthropics/claude-code-security-review.
Team review. Humans open the Render preview against the masked Neon branch. Comment. Can @claude a targeted fix — same action re-fires in write mode on the branch.
Merge to main. pull_request.closed cleanup tears down the Neon branch and Render preview. Render auto-deploys main to staging (against Neon’s staging branch with masked data).
Promote to production. Manual Render gate (per Ibrahim’s DevOps note). Neon production branch is untouched by PR work.

Where agents plug in:

Inline with dev: Claude Code CLI locally.
In CI: claude-code-action for PR review + @claude delegation — runs Agent SDK under the hood, scoped to the PR branch via GITHUB_TOKEN.
Webhook-triggered: small Render Background Worker using Agent SDK + Express / FastAPI for Linear / Slack / Sentry webhooks (see next section).
Multica: only if the team specifically wants a kanban board where agents are first-class members. Not required.

Webhook-triggered agents

Real patterns to copy:

Sentry Seer → Linear. Sentry’s Seer posts root-cause analyses into Linear issues via webhook; Linear’s Agents API routes @mention events back. HMAC-signed. Linear Agents Integration, Sentry docs.
Linear for Agents — first-party Agents API. Register a webhook URL, get mention events, reply via the Linear API.
GitHub @claude mentions. claude-code-action listens on issue_comment, pull_request_review_comment, issues.assigned → triggers agent runs on-demand. Marketplace listing.
GitHub Agent HQ mission control. Official pattern for routing tasks across Copilot coding agents, custom agents, and sub-agents.

For oxFlow a minimal webhook agent service on Render looks like:

Render Web Service (Node / Python), Express / FastAPI.
Endpoint verifies HMAC signature (Linear / Slack / GitHub).
Spawns an Agent SDK session (query({ prompt, options: { allowedTools, mcpServers } })).
Streams progress back to the source (Linear comment, Slack thread, GitHub PR comment).

Cost signals

Claude Code in GitHub Actions (token spend only):

Small PR (<200 LOC): $8–12.
Large PR (~2000 LOC): $30–40.
Typical monthly: $5–25 for ~10–15 PRs/week per KissAPI guide and earezki.com’s setup.
Anthropic’s managed “Code Review” product is deeper/more expensive at ~$15–25 per review — the open-source Action is cheaper and usually sufficient.

Claude Managed Agents. $0.08/session-hour + standard token rates. Example envelope: one agent running 4 hours of active session per weekday = 80 hours/month = $6.40 runtime + tokens (dominant). For 3 agents × 80h = ~$20 runtime + tokens.

oxFlow 3–5 dev monthly envelope:

Local Claude Code (per-dev, Pro or API): 3 × $100 ~ $300 on Pro; ~$150–300 API-metered.
CI PR review (100 PRs/month × $1.50 avg on Sonnet): **$150**.
Webhook agent service on Render ($7–25 compute) + token spend: ~$50.
Total: ~USD 200–500/month. Well under a junior dev’s day-rate.

Comparison matrix

Tool	OSS / self-host	Repo integration	Memory / skills	Pricing	Fit for 3–5 dev NZ team
Multica	Yes / Docker Compose	Via agent CLIs (Claude Code etc.)	pgvector-backed skills, workspace-scoped	Free (self-host)	Overkill; adds a PM layer most small teams don’t need
Claude Managed Agents	No / Anthropic-hosted	BYO via MCP or `git clone` in env	Session state + research-preview memory API	Tokens + $0.08/session-h	Useful for long async jobs; not your default
Claude Agent SDK + GitHub Actions	SDK free, you host runtime	Native via `GITHUB_TOKEN`, `claude-code-action`	`CLAUDE.md` + `.claude/skills/`	Tokens only (~$150–400/mo)	Best fit. Matches the GitHub-centric stack
Cursor Background Agents	No / hosted	GitHub only (clone + push)	Chat history, workspace rules	Pro $20/mo + credit pool	Good for individual devs; less CI-centric
Devin (Cognition)	No / hosted	GitHub	Long-horizon plan memory	$20/mo core + $2.25/ACU	Expensive if interactive; shines on “delegate and forget”
Replit Agent	No / hosted	Replit-centric	Project memory	$25/mo Core	Weak fit — oxFlow doesn’t deploy to Replit
Factory.ai (Droids)	No / hosted	Linear / Jira / GitHub; token-based	Droid workflows	From $20/mo + tokens	Strong for Linear-driven teams; adds lock-in

Recommendation for 361

Adopt this orchestration stack for Phase 1:

Dev laptops — Claude Code CLI with project CLAUDE.md per repo. One paid Pro seat per active dev, or API-metered if bursty.
CI — anthropics/claude-code-action@v1 in GitHub Actions in two modes:
- Automatic PR review on pull_request.opened / synchronize — Sonnet 4.6 for speed/cost.
- On-mention task mode on issue_comment containing @claude — writes to the PR branch. Scoped to PR branches only. No production access.
Data tier — Neon branch-per-PR via the Neon GitHub Action. Branch on PR open, drop on PR close. Staging branch is the source for preview branches so agents see realistic (masked) shapes. (See DB analysis.)
App tier — Render service previews wired to each PR’s Neon branch via renderPreviewsEnabled: true + env-var overrides. Staging gate is Render’s “deploy to staging on merge to main, manual promote to prod.” (See DevOps note.)
Webhook agents — small Render Web Service on Claude Agent SDK (TypeScript). Endpoints for Linear, Slack (@oxflow-bot), Sentry. HMAC-verified. MCP servers for GitHub, Neon, Postgres.
Skip Multica. Revisit in ~12 months if the agent count or task volume grows such that a dedicated board materially helps.
Skip Managed Agents. Revisit when a use case like “overnight build-the-whole-feature” that genuinely runs for hours appears.

Governance guardrails before go-live:

CLAUDE.md reviewed and version-controlled per repo (already — commit trail).
.claude/settings.json pins allowedTools, permissionMode: auto only for read tools. Writes require PR.
GitHub Actions secrets for ANTHROPIC_API_KEY scoped to environments. No prod DB access from PR workflows.
Neon branch-cleanup workflow tested — orphaned branches cost real money.

Cost envelope: USD 200–500/month for the full agent stack (licenses + tokens + Render compute), 3–5 devs, ~100 PRs/month. Well under one additional hire and directly accelerates the June 27 2026 Phase 1 target.

Alternatives considered

Adopt Multica. Genuinely good project and we may want it later, but it adds a PM / kanban-for-agents abstraction 3–5 people don’t benefit from. Every ticket we run through Multica is a ticket we could have run in GitHub Issues + @claude. Revisit if we grow past ~8 devs or start running many simultaneous long-horizon agent tasks.
Adopt Claude Managed Agents now. Pays off for genuinely long-running jobs (hours); our current workload is minutes. The $0.08/h meter plus no batch discount is material cost overhead. Revisit for overnight workflows.
Cursor Background Agents. Individual-dev-friendly but less CI-centric than claude-code-action. If a dev is already on Cursor, it’s fine; not worth switching the team onto Cursor for this.
Devin. Expensive for interactive work. The “delegate and forget” pitch is compelling but not where we are in Phase 1.
Factory.ai (Droids). Interesting for Linear-heavy teams; we’re GitHub Issues-heavy. Adds lock-in.

Sources

Linked from

Daily log 2026-04-17 — originating tasks (Claude Managed Agents + Multica, CI/CD workflows).

oxFlow Wiki

Explorer