Note: pending team review.
Plain-English Summary
oxFlow needs a way for the dev team to write code, test it, show it to Oxcon for approval, and then put it live — without stepping on each other’s toes or exposing real client data during testing.
Here’s what we’re proposing:
- Three platforms, that’s it. GitHub (where the code lives and gets checked automatically), Render (runs the app), and Neon (stores the data). No complex AWS/Azure setup. The whole team manages everything from three dashboards.
- A simple code flow. A developer builds a feature on their own branch. When it’s ready, it goes to a “staging” version of the app that Oxcon can log into and review. Once they approve, it goes live. If something breaks in production, we can push an emergency fix directly.
- Real data for testing, but safe. When we deploy to staging, we take an instant snapshot of the live database and automatically swap out sensitive info (supplier names, dollar values, margins) with fake data. This means Oxcon reviews against realistic data without anyone seeing confidential pricing.
- Automatic quality checks. Every time a developer submits code, GitHub automatically runs tests, checks for bugs, and scans for security issues. If anything fails, the code can’t go live.
- Slack notifications. The
#oxflowchannel gets pinged when staging is updated, when something deploys to production, or when a test fails.
Estimated cost: ~NZ$205–370/mo at early stage. Scales with usage — no big upfront spend.
Question
How should the oxFlow team collaborate on code, deploy to staging and production, handle database snapshots for staging, and what hosting/infrastructure should we use — given enterprise requirements, integration points (Xero, M365, MS Project, Workbench), and a small team (3-5 devs)?
TL;DR
Recommended: GitHub Flow with a staging gate, hosted on Render, database on Neon (for copy-on-write branching), CI/CD via GitHub Actions, notifications to #oxflow Slack channel. Three platforms total. Estimated ~NZ$205–370/mo at early stage.
Tech stack: NestJS + React SPA (Vite + TanStack Router) + TypeScript + PostgreSQL (Neon) + Drizzle ORM. Chosen because TypeScript has first-class SDKs for every integration point — Xero, MSAL (M365 SSO), Claude AI. PHP has no official MSAL library from Microsoft.
The Kitchen Analogy
Throughout this doc we use a restaurant analogy to explain the architecture. If you only read one section, read this one.
| Restaurant | oxFlow | What it does |
|---|---|---|
| Dining room | React SPA (frontend) | What users see and interact with — screens, buttons, forms |
| Kitchen | NestJS API (backend) | Where the real work happens — calculates costs, enforces rules, processes data |
| Chefs | Services (business logic) | Follow recipes to prepare each order — one chef for estimates, one for adjudications, one for commercials |
| Head chef | API router | Decides which chef handles which order |
| Recipes | Business rules (123 of them) | The instructions chefs follow — “you can’t submit with unpriced items”, “commercial rules apply in sequence” |
| Pantry | Neon PostgreSQL (database) | Where all ingredients are stored — tenders, estimates, resources, price books |
| Prep board | Redis + BullMQ (job queue) | Long orders that can’t be served immediately — PDF exports, Excel imports, batch calculations |
| Reservation book | Auth.js + M365 SSO | Checks who’s allowed in and what role they have — Admin, Lead Estimator, Estimator |
| Test kitchen | Staging environment | An exact replica of the real restaurant where new dishes are tasted before going on the real menu |
1. Tech Stack
TL;DR: TypeScript end-to-end (NestJS backend + React frontend) because every system oxFlow integrates with (Xero, M365, AI) has its best SDK in TypeScript. PHP doesn’t even have an official Microsoft login library.
| Layer | Technology | Kitchen term | Why this choice |
|---|---|---|---|
| Frontend | React + TypeScript + Vite + TanStack Router + Tailwind CSS | The dining room | Interactive dashboards, tree editors, real-time collaboration. SPA gives full control over client state |
| Backend | NestJS + TypeScript | The kitchen | 123 business rules need a structured home. NestJS provides modules (stations), services (chefs), guards (door policy), and interceptors (audit logging) out of the box |
| Database | PostgreSQL on Neon | The pantry | Recursive CTEs for 5-level cost hierarchies, materialized views for roll-ups, JSONB for flexible metadata. Neon adds instant database branching for staging |
| ORM | Drizzle ORM | Pantry labels | Full SQL control for recursive queries. Prisma cannot do recursive CTEs (open issue since 2020) |
| Job queue | BullMQ + Redis | The prep board | PDF generation, Excel import/export, batch cost calculations — all run in the background without blocking users |
| Auth | Auth.js + Microsoft Entra ID (Azure AD) | The reservation book | Oxcon uses M365. Auth.js has a built-in Microsoft provider — SSO in minimal config |
| Real-time | WebSocket gateway (NestJS built-in) | Kitchen bell | Per-item locking, presence indicators, live updates when another estimator edits |
Why This Stack (Integration Reality Check)
oxFlow integrates with Xero, Microsoft 365, MS Project, and Workbench. SDK availability drove the stack decision:
| Integration | TypeScript SDK | PHP SDK | Winner |
|---|---|---|---|
| Xero | xero-node (250 stars, updated daily) | xero-php-oauth2 (105 stars) | TypeScript |
| M365 SSO (MSAL) | MSAL.js (4,044 stars, updated daily) | No official MSAL for PHP | TypeScript |
| MS Project (.MPP) | Via Python microservice (MPXJ) | No option | Neither — separate service |
| Claude AI | anthropic-sdk-typescript (1,862 stars) | No official SDK | TypeScript |
TypeScript has first-class, officially maintained SDKs for every integration point. PHP does not have an official MSAL library from Microsoft, making M365 SSO a second-class experience.
Stacks Considered and Rejected
| Stack | Why not |
|---|---|
| Next.js (App Router) | No service layer conventions for 123 business rules. Server-first paradigm is a poor fit for highly interactive dashboards (Railway, Documenso, Northflank all moved away from Next.js for this reason). Prisma (common pairing) cannot do recursive queries |
| Laravel + Inertia.js | No official MSAL for PHP (M365 SSO is second-class). Xero and AI SDKs are less maintained in PHP. Strong framework but wrong ecosystem for these integration points |
| NestJS + Next.js (separate) | Two services to deploy and maintain. Added complexity without proportional benefit at this team size |
| Remix | Small ecosystem, documentation gaps, React Router v7 transition uncertainty |
2. Architecture
TL;DR: One app that does everything — the user-facing screens and the backend logic are packaged together and deployed as a single unit. No microservices, no separate frontend server.
One deployable application. The kitchen and dining room are in the same building.
┌─────────────────────────────────────────────┐
│ Render │
│ │
│ ┌──────────────────────────────────────┐ │
│ │ NestJS Application │ │
│ │ │ │
│ │ ┌─────────┐ ┌──────────────────┐ │ │
│ │ │ Static │ │ API Routes │ │ │
│ │ │ React │ │ /api/v1/* │ │ │
│ │ │ SPA │ │ │ │ │
│ │ │ (dist/) │ │ ┌────────────┐ │ │ │
│ │ │ │ │ │ Services │ │ │ │
│ │ │ Dining │ │ │ (chefs) │ │ │ │
│ │ │ room │ │ │ │ │ │ │
│ │ │ │ │ │ Kitchen │ │ │ │
│ │ └─────────┘ │ └────────────┘ │ │ │
│ │ │ │ │ │
│ │ │ ┌────────────┐ │ │ │
│ │ │ │ BullMQ │ │ │ │
│ │ │ │ Workers │ │ │ │
│ │ │ │ (prep │ │ │ │
│ │ │ │ board) │ │ │ │
│ │ │ └────────────┘ │ │ │
│ └──────────────────────────────────────┘ │
│ │
│ ┌──────────┐ │
│ │ Redis │ ← job queue storage │
│ └──────────┘ │
└─────────────────────────────────────────────┘
│
│ SQL queries
▼
┌──────────────────┐
│ Neon PostgreSQL │ ← the pantry
│ │
│ main branch │ ← production ingredients
│ staging branch │ ← test kitchen ingredients
└──────────────────┘
┌──────────────────┐
│ Microsoft Entra │ ← the reservation book
│ (Azure AD SSO) │ (Microsoft's servers — we
│ │ just redirect to them)
└──────────────────┘
How a Request Flows (Ordering a Dish)
- User opens
oxflow.3sixtyone.co— Render serves the React SPA (dining room opens) - User clicks “Open Estimate #42” — SPA sends
GET /api/v1/estimates/42to NestJS - NestJS auth middleware checks the Microsoft token (reservation book confirms they’re allowed)
- NestJS route calls the Estimate Service (chef picks up the order)
- Service queries Neon for the estimate + items + worksheet (chef goes to the pantry)
- Service computes cost roll-ups following business rules (chef follows the recipe)
- Response sent back to SPA — user sees the estimate (dish served to the table)
How a Background Job Flows (A Slow-Cook Order)
- User clicks “Export PDF” — SPA sends
POST /api/v1/estimates/42/export - NestJS adds a job to BullMQ (order pinned to the prep board)
- API responds immediately: “we’re preparing your export” (waiter tells customer it’s coming)
- A BullMQ worker picks up the job, generates the PDF (kitchen hand works on it)
- When done, user gets notified and can download it (dish brought to the table)
3. Platforms & Hosting
TL;DR: Three accounts to manage — GitHub (code), Render (runs the app), Neon (database). We control everything, no dependency on Oxcon’s IT. Login with Oxcon’s Microsoft accounts still works from anywhere. Disaster recovery exceeds the spec targets (RPO ~zero, RTO under 1 hour). Code is organised as a Turborepo monorepo with shared TypeScript types.
Three platforms to manage. That’s it.
| Platform | What it hosts | Kitchen term | Estimated cost |
|---|---|---|---|
| GitHub | Code, CI/CD, pull requests | The recipe book + the kitchen inspector | Free (private repos, 2000 CI minutes/mo) |
| Render | NestJS app + React SPA + Redis | The restaurant building | ~NZ$85–170/mo |
| Neon | PostgreSQL database + branches | The pantry (with instant cloning) | ~NZ$35–120/mo |
Why Render?
361 Coders manages multiple client projects. Deploying on a client’s Azure tenant would mean depending on their IT team for permissions, deployments, and debugging. Render gives the 361 team full control from a single dashboard.
SSO with Oxcon’s M365 works regardless of where the app is hosted — it’s just OAuth2 redirects and tokens. The app can be anywhere.
Why Not Azure/AWS?
| Concern | Answer |
|---|---|
| ”But Oxcon uses M365” | SSO works from any host. Auth.js redirects to Microsoft’s login page — doesn’t matter where the app runs |
| ”Azure is more enterprise” | Oxcon cares that data is secure and the app works, not which logo is on the server |
| ”We might need Azure later” | NestJS runs anywhere Node.js runs. Migration is straightforward if data residency requirements emerge |
| Complexity | Azure/AWS requires VPCs, security groups, IAM policies, container registries. Weeks of DevOps setup vs hours on Render |
Why Neon Specifically?
Neon’s killer feature is database branching — instant, copy-on-write clones of the production database. This directly solves the staging data requirement (see Section 6). No other managed PostgreSQL service offers this at the same level.
| Database platform | Branching | Staging snapshot story |
|---|---|---|
| Neon | Yes (instant, milliseconds) | One command, instant clone, pay only for diffs |
| Supabase | Schema only (no data) | Must seed data manually or script pg_dump/restore |
| AWS RDS | No | Script pg_dump → anonymize → pg_restore (minutes to hours) |
| Azure PostgreSQL | No | Same manual dump/restore workflow |
See also: 2026-04-17-database-analysis.md for the full database comparison.
Disaster Recovery
The NFR specifies RPO <24h and RTO <4h. The selected platforms exceed both targets:
| Target | Requirement | Platform capability |
|---|---|---|
| RPO (data loss window) | < 24 hours | Neon provides continuous point-in-time recovery (PITR) — RPO is effectively zero. Any committed transaction can be recovered |
| RTO (time to restore) | < 4 hours | Render redeploys from git in under 10 minutes. Neon database branches restore in seconds. Combined RTO is well under 1 hour |
Specific Plans
| Platform | Plan | Why this tier |
|---|---|---|
| Neon | Scale ($69 USD base) | Includes autoscaling compute, 10 branches, connection pooling, PITR. Free tier has connection limits that would break production |
| Render | Standard ($25 USD/service) | Includes zero-downtime deploys, health checks, persistent disk, managed Redis. Starter tier lacks production reliability features |
Monorepo Structure
NestJS (backend) and React (frontend) share TypeScript types and validation schemas. The repo uses Turborepo + pnpm workspaces to manage this:
oxflow/
├── apps/
│ ├── api/ ← NestJS backend
│ └── web/ ← React SPA
├── packages/
│ └── shared/ ← TypeScript types, validation schemas, constants
├── turbo.json
├── pnpm-workspace.yaml
└── package.json
Turborepo gives selective builds (only rebuild what changed), shared type checking across apps, and parallel task execution in CI.
4. Git Branching Strategy
TL;DR: Developers build features on their own branch, merge to staging for Oxcon to review, then merge to main to go live. Emergency fixes skip staging and go straight to production.
Two permanent branches: main (the real restaurant) and staging (the test kitchen).
main (production — the real restaurant, live customers)
│
├── staging (test kitchen — Oxcon reviews new dishes here)
│
├── feature/OXF-42-adjudication-workflow
├── feature/OXF-58-recipe-builder
├── bugfix/OXF-71-rate-rollup-rounding
└── hotfix/OXF-89-login-crash
The Flow
Think of it as: a chef develops a new dish → tests it in the test kitchen → client tastes it → if approved, it goes on the real menu.
Developer branches from main (learns from the real menu)
│
▼
feature/OXF-42 ── PR ──▶ staging (code review + auto-deploy to test kitchen)
│
Oxcon reviews staging.oxflow.3sixtyone.co
│
client approves the new dish
│
▼
staging ── merge ──▶ main (goes on the real menu)
│
auto-deploy to oxflow.3sixtyone.co
Rules
| Rule | Kitchen analogy | Why |
|---|---|---|
Always branch from main | Learn from the real menu, not the experimental one | Prevents inheriting half-finished work from other developers |
PRs target staging | New dishes go to the test kitchen first | Team code review + client review before production |
| CI must pass before merge | Kitchen inspector checks the dish | Lint, types, tests — catches problems before they reach staging |
| Short-lived branches (1-5 days) | Don’t hog a station for weeks | Reduces merge conflicts, keeps work small and reviewable |
| Staging → main after client approval | Approved dish goes on the real menu | Client has seen and signed off on what’s being shipped |
| Hotfixes go directly to main | Fire in the real kitchen — fix it now | Branch from main, PR to main, then merge main → staging to sync |
| If a feature is rejected, revert it on staging | Remove the failed dish from the test kitchen | Keep staging clean before merging to main |
Branch Naming
feature/OXF-{ticket}-{short-description} e.g. feature/OXF-42-adjudication-workflow
bugfix/OXF-{ticket}-{short-description} e.g. bugfix/OXF-71-rate-rollup-rounding
hotfix/OXF-{ticket}-{short-description} e.g. hotfix/OXF-89-login-crash
Why Not GitFlow?
GitFlow uses five branch types (main, develop, release/*, hotfix/*, feature/*) and requires dual merges on every release and hotfix. It was designed for quarterly releases — not continuous deployment. For a 3-5 person team shipping a web app, the ceremony adds merge conflicts without proportional value.
5. CI/CD Pipeline
TL;DR: Every time a developer submits code, GitHub automatically checks it for bugs, security issues, and style problems. If anything fails, the code can’t go live. Deployments and test results get posted to
#oxflowin Slack.
All CI/CD runs on GitHub Actions. Notifications go to the #oxflow Slack channel.
On Every PR (Targeting Staging)
The kitchen inspector checks the dish before it enters the test kitchen.
| Step | What it checks | Runs in |
|---|---|---|
| Lint + Format | Code style (ESLint + Prettier) | ~30s |
| Type check | TypeScript compiler (tsc --noEmit) | ~30s |
| Unit tests | Business logic (Vitest) | ~1-2min |
| Integration tests | API + database (Vitest + Neon dev branch) | ~2-3min |
| Build check | Both API and SPA compile successfully | ~1-2min |
| Security scan | Known vulnerabilities (CodeQL or Trivy) | ~1-2min |
All steps run in parallel where possible. PR cannot merge unless all pass.
On Merge to Staging
New dish enters the test kitchen.
1. Build API + SPA
2. Create fresh Neon branch from production (instant clone of the pantry)
3. Run anonymization on the new branch (swap real labels for fake ones)
4. Run pending database migrations
5. Deploy to Render staging environment
6. Post to #oxflow Slack: "Staging updated — OXF-42 adjudication workflow ready for review"
On Merge Staging → Main (Production Deploy)
Approved dish goes on the real menu.
1. Build API + SPA
2. Run migrations on Neon main branch
3. Deploy to Render production environment
4. Run smoke tests against production
5. Post to #oxflow Slack: "Production deployed — OXF-42 adjudication workflow is live"
Hotfix Flow
Fire in the real kitchen — fix it now, don’t go through the test kitchen.
1. Branch from main
2. PR targets main directly
3. Same CI checks run
4. On merge: deploy to production immediately
5. Post to #oxflow Slack: "HOTFIX deployed — OXF-89 login crash fixed"
6. Merge main → staging to keep test kitchen current
6. Database Snapshots & Staging Data
TL;DR: When we deploy to staging, we take an instant copy of the live database and automatically replace sensitive info (supplier names, dollar values, margins) with fake data. Audit logs are excluded from masking to preserve immutability. Oxcon reviews against realistic data without anyone seeing confidential pricing. Database migrations are forward-only — if we need to undo a change, we write a new migration rather than rolling back.
This is the “test kitchen pantry” — how staging gets realistic data without exposing real commercial information.
The Problem
Staging needs realistic data to test against. But production data contains sensitive information — supplier rates, tender values, commercial margins. We can’t just copy it raw.
The Solution: Neon Branch + Anonymization
Think of it as: clone the real pantry, then swap all the labels so nobody can tell which supplier provided which ingredients.
Production DB (Neon main) — real pantry
│
│ neonctl branches create --name staging-20260420
│ (instant copy-on-write — milliseconds)
│
▼
Raw staging branch (exact copy of prod data)
│
│ Anonymization script runs:
│ ── Supplier names → "Supplier A", "Supplier B", "Supplier C"
│ ── Tender values → randomized within ±15%
│ ── Commercial margins → randomized
│ ── User emails → user-{hash}@test.oxflow.3sixtyone.co
│ ── Referential integrity preserved (same masked ID everywhere)
│
▼
Anonymized staging branch
│
│ Pending migrations applied (if any new schema changes)
│
▼
Staging database ready — API connects to this branch
What Gets Anonymized
| Data type | Treatment | Why |
|---|---|---|
| Company names (suppliers, subcontractors) | Deterministic pseudonyms | Commercial sensitivity |
| Tender dollar values | Randomized within ±15% | Pricing confidentiality |
| Commercial rules (margins, markups) | Randomized | Competitive advantage |
| Resource rates (from Price Book) | Randomized within ±15% | Supplier pricing |
| User emails | Masked to user-{hash}@test.oxflow.3sixtyone.co | Privacy |
| Estimate items, quantities, units | Kept as-is | Needed for realistic testing |
| Headings, hierarchy structure | Kept as-is | Needed for realistic testing |
| Codes, categories, units of measure | Kept as-is | Reference data, not sensitive |
Tooling
PostgreSQL Anonymizer 2.0 — a Postgres extension that runs inside the database. Uses deterministic hash-based masking with a secret salt, meaning the same input always produces the same masked output. This preserves joins and relationships across tables.
Audit Log Policy
Audit tables are excluded from anonymization on staging branches. The 7-year immutable audit log requirement means these records must never be mutated. On staging, audit data either retains original values (acceptable for internal staging since only the 361 team accesses it) or is truncated entirely for client-facing demos.
Migration Strategy
Database migrations are forward-only with compensating migrations for rollback. For a financial system where cost calculations, snapshots, and audit logs depend on schema integrity, reversible migrations risk data corruption. If a migration needs to be undone, a new forward migration is written that compensates for the change.
Lifecycle
- Fresh branch created on every merge to
staging - Previous staging branch is automatically deleted by CI
- Branches are cheap — copy-on-write, you only pay for the data that changes after branching
- Each developer can also create personal Neon branches for local development
7. Environments Summary
TL;DR: Three versions of the app — production (real users), staging (client review with fake data), and local dev (each developer’s machine).
Three environments, like three versions of the restaurant.
| Environment | URL | Kitchen analogy | Deploys from | Database | Who uses it |
|---|---|---|---|---|---|
| Production | oxflow.3sixtyone.co | The real restaurant | main branch | Neon main | Oxcon estimators (real work) |
| Staging | staging.oxflow.3sixtyone.co | The test kitchen | staging branch | Neon branch (snapshot + anonymized) | Oxcon reviewers + 361 team |
| Local dev | localhost:3000 | A chef’s home kitchen | Feature branch | Neon dev branch or local Docker PG | Individual developers |
8. Notifications (Slack oxflow)
TL;DR: A dedicated
#oxflowSlack channel gets automatic updates whenever code is deployed, tests fail, or staging is ready for client review.
All automated notifications go to a dedicated #oxflow Slack channel.
| Event | Message |
|---|---|
| PR opened | ”PR #42 opened: Adjudication workflow — ready for code review” |
| CI failed | ”CI failed on PR #42: unit tests — 3 failures in adjudication.service.spec.ts” |
| Staging deployed | ”Staging updated with PR #42 — review at staging.oxflow.3sixtyone.co” |
| Production deployed | ”Production deployed — OXF-42 adjudication workflow is live” |
| Hotfix deployed | ”HOTFIX deployed — OXF-89 login crash fixed” |
| Staging review requested | ”OXF-42 ready for client review @ryan @greg @matt” |
9. Open Questions
TL;DR: Five things need team/client input before we can finalise — domain, data residency, ticket tracking tool, staging review timeline, and budget approval.
Items that need team or client input before finalising.
| Question | Who decides | Impact |
|---|---|---|
| Subdomain confirmation (oxflow.3sixtyone.co) | 361 team | URLs for production and staging environments |
| Oxcon data residency requirements | Oxcon | May require hosting migration to Azure if NZ data residency is mandated |
| Ticket tracking tool (Asana / Linear / other) | 361 team | Branch naming convention (OXF-{ticket}) depends on this |
| Staging review SLA | Oxcon + 361 | How long does client have to review before staging is refreshed |
| Budget approval for Render + Neon | 361 / Oxcon | Estimated ~NZ$205–370/mo for hosting + database |
10. Cost Estimate
TL;DR: ~NZ$205–370/mo to start on production-grade plans (Neon Scale + Render Standard). Scales with usage, no big upfront spend. All pay-as-you-grow.
Estimated monthly costs at early stage (small team, moderate data).
| Service | Plan | Estimated monthly | Notes |
|---|---|---|---|
| GitHub | Team (free for orgs) | $0 | 2000 CI minutes/mo included |
| Render | Standard (web service + managed Redis) | ~NZ$85–170 | Zero-downtime deploys, health checks, persistent disk |
| Neon | Scale ($69 USD base) | ~NZ$120–200 | Autoscaling compute, 10 branches, PITR, connection pooling |
| Domain | — | ~NZ$25/yr | If purchasing a new subdomain |
| Total | ~NZ$205–370/mo | Grows with usage, not upfront |
All prices converted from USD at ~1.70 NZD/USD. Actual exchange rate will vary.
These are conservative (high-side) estimates. Actual costs will likely come in lower, especially in early stages with low traffic and small data volumes. All platforms are pay-as-you-grow — no large upfront commitments.
See also
These three research notes extend Ibrahim’s DevOps foundation with the agentic / memory layers that sit on top — they do not replace any decision here.
- Shared project memory —
CLAUDE.md, session capture, wiki-style memory patterns for the team. - Agentic coding and PR review — which PR review agent wires into the GitHub Actions pipeline described above (CodeRabbit +
claude-code-action), and how Claude / Codex / BMAD / Superpowers divide the work. - Multica and Claude Managed Agents — how agents get orchestrated on top of GitHub + Render + Neon, and why Multica / Managed Agents are not yet the right fit for Phase 1.
- Database analysis — the Neon + S3 recommendation feeding the per-PR Neon branch pattern above.
BRANDING.md— visual language used in the accompanying HTML companions.
Last updated: 2026-04-20 Author: Ibrahim Hussain, 361 Coders NZ