Research — shared project memory for oxFlow

Question

How should 361 capture, share, and reuse the knowledge generated when team members run Claude Code against oxflow? Specifically: (a) should conversation history and prompts be committed to the repo, and in what shape; (b) what’s the right pattern for a shared “project brain” across a 3–5 dev team; (c) how does this fit alongside the existing DevOps stack and the DB recommendation?

TL;DR

Commit a curated, human-readable project brain to oxflow-docs. Auto-generate session TL;DRs via a Claude Code SessionEnd hook into status/sessions/YYYY-MM-DD-slug-author.md — YAML frontmatter + TL;DR + key steps, nothing heavier. Mirror each session as a static HTML render (daaain/claude-code-log) into status/sessions/html/ so the full chat is one click away for humans who want to read the whole conversation. Commit a hand-written team/devs/{handle}.md per dev so MCPs, hooks, and skill picks are visible to humans and agents. Archive raw .jsonl transcripts to S3 (s3://361-transcripts-archive/…) after a privacy scrub — out of git, still team-accessible. Adopt Andrej Karpathy’s LLM-wiki pattern as the compounding layer, using Garry Tan’s open-source gbrain as the reference implementation. Skip managed memory services (Mem0, Zep, Letta) in Phase 1 — they add cost, privacy exposure, and are unnecessary at 3–5 devs. Privacy via a PreToolUse regex scrub + gitleaks CI guard + a <private>…</private> tag convention. BMAD review agents read session TL;DRs before reviewing a PR rather than re-deriving intent from the diff — session pipeline and BMAD are two ends of one workflow, composed via status/sessions/ and status/reviews/ (see the composition subsection).

Approach

Delegated the landscape survey to a research agent with web access. Briefed with the oxFlow context, the existing DevOps stack, and the DB recommendation. Output cross-referenced against Claude Code’s official docs, the Karpathy LLM-wiki gist, active open-source memory projects, and team patterns at Vercel / DoorDash / Anthropic.

Findings

Claude Code’s built-in memory

Four primitives exist today (docs.claude.com — memory, hooks):

CLAUDE.md hierarchy. Loaded automatically at session start. Precedence (lowest → highest): enterprise policy → ~/.claude/CLAUDE.md (user global) → parent-dir CLAUDE.md → project-root CLAUDE.md → subdirectory CLAUDE.md → CLAUDE.local.md (gitignored per-dev overrides). Deeper wins. Import other files via @path/to/file.md.
Auto memory (MEMORY.md). Claude writes its own notes based on corrections and preferences; stored alongside CLAUDE.md, editable by the assistant.
Hooks. Shell commands fired at lifecycle events. Ones that matter for capture:
- SessionStart — stdout is injected into the model’s context. Perfect for piping in git branch, last 10 commits, open PRs, or the latest status/sessions/ entry.
- UserPromptSubmit — once per turn; see the prompt before the model does.
- PreToolUse / PostToolUse — per tool-call; ideal for secret redaction and audit logs.
- PreCompact — fires before Claude’s own auto-summary. Receives the content about to be dropped. Under-used but perfect for “save this before we forget”.
- Stop / SubagentStop — per-turn finish.
- SessionEnd — terminal event; receives transcript_path and cost data. Cannot block, ideal for archival.
Transcripts on disk. Stored as JSONL at ~/.claude/projects/<slugified-cwd>/<session-id>.jsonl — one JSON object per event. A sibling sessions-index.json holds auto-summaries and timestamps. This is the raw substrate for any session-to-markdown pipeline.

Andrej Karpathy’s LLM-wiki pattern

In April 2026, Karpathy tweeted and published a gist describing his personal use of LLMs — primarily as a second-brain wiki compiler, not for code. Three operational layers:

Raw sources — immutable PDFs, transcripts, bookmarks (append-only).
The wiki — LLM-generated markdown: entity pages, summaries, cross-references, contradiction flags. A compounding artefact, unlike RAG which re-assembles from scratch every query.
Schema — a CLAUDE.md-style file telling the LLM how to maintain the wiki (naming, linking, lint rules).

Three operations: ingest (new source → update wiki pages), query (search wiki, file valuable findings back), lint (detect contradictions, orphans, stale claims). Karpathy’s own wiki on one topic is ~100 pages / 400k words.

Open-source derivatives shipping now:

Pratiyush/llm-wiki — production-grade implementation, reads Claude Code / Codex / Cursor / Gemini sessions.
coleam00/claude-memory-compiler — hooks + Claude Agent SDK, extracts decisions into wiki articles.
ussumant/llm-wiki-compiler — topic-based wiki compilation.
vanillaflava/llm-wiki-claude-skills — skill pack for Obsidian / Logseq vaults.

”gbrain” — what Bilal was thinking of

Confident match: garrytan/gbrain — Y Combinator CEO Garry Tan’s opinionated open-source “knowledge brain” for AI agents (VibeSparking overview). Properties that matter for oxFlow:

Markdown-in-git storage. MECE directories (people/, companies/, deals/, concepts/), each with a README.md “resolver” telling the agent how to place new info.
Compiled-truth + timeline. Each page has a top section the agent rewrites as evidence arrives, and an append-only timeline underneath. Exactly the pattern you’d want for ADRs.
Tech stack. PGLite (embedded Postgres 17.5 via WASM, ~2 s cold start, no Docker) + pgvector HNSW, TypeScript/Bun, hybrid search with reciprocal rank fusion.
Automation. 21 cron jobs, nightly enrichment, MCP server for Claude/Cursor, 26 “skills” (markdown instructions).
Scale proof. Production deployment of 17,888 pages / 4,383 people / 723 companies, built in 12 days.

Adjacent but different: Graphbrain is an academic semantic-hypergraph DB, not related to gbrain. huytieu/COG-second-brain is an explicit community derivative of gbrain + Garry Tan’s gstack.

Agent memory systems — managed options

System	Model	Team sharing	Cost	Fit for oxFlow
Mem0	Three-tier (user/session/agent), hybrid vector+graph+KV	Managed; per-user scope	Paid SaaS or self-host	Overkill for 3–5 devs; external service = another Privacy Act surface
Zep	Temporal knowledge graph (Graphiti). 63.8% LongMemEval vs Mem0 49%	Groups / sessions	Paid SaaS	Best for temporal reasoning; not needed for coding sessions
Letta / MemGPT	Tiered (core / archival), self-editing system prompt	Agent-per-user	OSS	Good for long-running stateful agents; Claude Code isn’t that
LangMem	JSON memory in LangGraph store	Via LangGraph scoping	OSS	Requires LangGraph; poor fit with NestJS
Cognee	Graph-based, extract entities	Workspaces	OSS + paid	Closer to Karpathy wiki; still a service
`thedotmack/claude-mem`	SQLite + Chroma + local ONNX embeddings	Local per-dev	Free, local	Strong fit — hooks-native, 46k stars, `<private>` tag scrubbing
memsearch (Milvus)	Milvus-backed	Local	Free	Alternative if you want a Milvus dependency

Privacy rule of thumb: anything leaving the laptop into a managed service needs a DPA. Local-first (claude-mem, llm-wiki) sidesteps this.

Git-as-memory patterns

Real teams are converging on “the repo IS the memory”:

AGENTS.md / CLAUDE.md symlinks. Vercel’s next.js repo has CLAUDE.md as a symlink to AGENTS.md. Uses  markers so agents can own a section without stepping on hand-written content. create-next-app now generates both automatically (Next.js AI agents guide).
Committed .claude/ directory. Mature setups commit .claude/commands/, .claude/agents/, .claude/skills/, and .claude/settings.json. Gitignore .claude/settings.local.json (personal overrides) and transcripts.
MADR decision records. Numbered markdown ADRs in docs/adr/NNNN-slug.md; Martin Fowler on ADRs, MADR. oxflow-docs’s existing research/ + decisions/ folders already match this shape.
Status/daily rituals. Hannah Stulberg’s DoorDash “Team OS” commits a daily engineering log that Claude reads at SessionStart and appends to at SessionEnd.
Multi-agent observability. disler/claude-code-hooks-multi-agent-observability is the reference implementation for streaming hook events to a dashboard, tagging sessions with session_id + git branch.

Conversation-history capture for Claude Code — concrete pipeline

End-to-end recipe matching Bilal’s brief:

(a) Where transcripts live: ~/.claude/projects/-Users-<user>-Desktop-oxflow/<session-id>.jsonl — one JSON line per event (fazm.ai walkthrough).

(b) SessionEnd hook to extract — in .claude/settings.json:

{ "hooks": { "SessionEnd": [{ "hooks": [{ "type": "command",
  "command": "bash .claude/scripts/session-capture.sh \"$CLAUDE_TRANSCRIPT_PATH\"" }] }] } }

(c) Auto-summarise via a headless subagent. The script pipes the transcript into claude -p "@.claude/templates/tldr.md" (headless mode) using a template that produces: one-line summary, 3–5 decisions, files touched, follow-ups, knowledge-base delta. One call per session — cheap.
(d) Privacy scrub. Regex pass for sk-ant-*, sk-*, ghp_*, AWS_*, postgres://…, DATABASE_URL, 40+-char hex strings, anything wrapped in <private>…</private> (convention claude-mem already supports). Drop .env* reads from the transcript entirely. detect-secrets or gitleaks as a pre-commit + CI guard (aitmpl.com — secrets hooks).
(e) Where to commit. One file per session at status/sessions/YYYY-MM-DD-<slug>-<handle>.md. Locked-in shape — YAML frontmatter is machine-parseable (agents grep by branch, author, files); body is human-scannable in ~10 seconds. Same file serves both audiences:
```
---
author: bilaljawadi45
session_id: 2026-04-20-a7b3
branch: feat/ltree-migration
duration_min: 47
cost_usd: 1.82
---
 
# TL;DR
<1–2 sentences — what the session delivered>
 
# Decisions
- <decision + one-line rationale>
 
# Key steps
1. <pivot point — not every tool call>
2. <pivot point>
 
# Files touched
<comma-list of paths>
 
# Follow-ups
- <open item>
 
# Knowledge delta
<new ADR / research / template paths emitted this session, or "none">
```
The Key steps layer is deliberate — it captures the pivot points of the session (what the author decided, where they changed direction) rather than a replay of every tool call. That is what makes the file useful to a future agent reading the PR and to a teammate picking up the branch.
(f) Also mirror HTML. Run daaain/claude-code-log over the same transcript and commit to status/sessions/html/{session-id}.html. The TL;DR .md is what humans scan and agents parse; the HTML is what a human opens when they need to read the whole conversation — reviewing a teammate’s session, auditing a decision, onboarding a new hire. Different surfaces, different readers, same underlying transcript. The raw JSONL still goes to S3 (see next subsection) for forensic use.
(g) Discovery. status/sessions/INDEX.md regenerated by a small CI step (20-line script reading frontmatter → grouped table). Same JSON can later power an in-app “what the team’s been working on” pane if we add one to oxflow itself.

Raw transcripts: S3, not git, not a DB

The full JSONL conversation still needs to live somewhere team-accessible — agents and humans occasionally want to drop back into the raw record to recover a tool call or re-hydrate a deleted thought. Locking in:

Destination. s3://361-transcripts-archive/{handle}/{YYYY-MM}/{session-id}.jsonl. Already part of 361’s stack for oxFlow file uploads, so no new vendor surface.
Scrub before upload. Same regex pass the SessionEnd summariser uses — sk-ant-*, sk-*, ghp_*, AWS_*, postgres://…, DATABASE_URL, 40+-char hex strings, plus anything wrapped in <private>…</private>. Any flagged match → redact, log, continue.
Lifecycle. Standard for 30 days (recent sessions occasionally queried), then Glacier via an S3 lifecycle rule. Pennies per month at the projected team volume.
Access. Team IAM role. Read-only for agents; write-only from the SessionEnd hook runner.
No git commit. Even scrubbed, JSONL grows fast and nothing a human wants to read belongs in a docs repo.
Rejected: separate Neon DB. Cleaner if we ever needed SQL across all transcripts — but that query use-case does not exist today. Another DB to maintain is not free.
Rejected: shared table in oxFlow’s product Neon DB. Couples team-ops data to product data. Don’t.
Escape hatch. If cross-transcript search ever matters, load the S3 archive on demand into DuckDB or a Neon branch — do not stand up a permanent store pre-emptively.

Recommendation for 361

Minimal pragmatic stack for Phase 1 (now → June 27):

Committed to oxflow-docs:

CLAUDE.md (root) — stays as-is; add a pointer to status/sessions/INDEX.md and research/ as the compounding knowledge base. (Already partially done — see commit history.)
AGENTS.md — symlink to CLAUDE.md, Vercel convention. Future-proofs for Cursor / Codex picking up the same instructions.
team/devs/{handle}.md — human-authored per-dev profile. One file per teammate (bilaljawadi45.md, ibrahim-h361.md, bilelragued.md, candymachineater.md) plus a TEMPLATE.md for onboarding. Covers: MCPs connected (name + one-line why), SessionStart/SessionEnd hooks, custom slash commands, skills installed, style preferences (terse vs narrated, approval mode), anything a teammate or agent should know before picking up your branch. Hand-maintained, not auto-generated from JSON — an auto-dump would leak secrets and strip the “why” that makes the file worth reading. A paragraph saying “I have the Linear MCP because I live in tickets, and my SessionStart hook pipes in my current sprint” is worth more than a config blob.
.claude/commands/ — shared slash commands (/new-adr, /session-summary, /research-note).
.claude/agents/ — reusable subagents (summariser, secret-scrubber, ADR-writer).
.claude/settings.json — hooks + permissions. Committed.
status/sessions/YYYY-MM-DD-<slug>-<handle>.md — auto-generated TL;DR in the shape locked in above (frontmatter + TL;DR + Decisions + Key steps + Files touched + Follow-ups + Knowledge delta). One file per session.
status/sessions/html/{session-id}.html — auto-generated full rendered transcript via daaain/claude-code-log. Browser-readable, one click from the TL;DR .md. Raw JSONL still goes to S3.
status/sessions/INDEX.md — auto-regenerated table of contents.
status/reviews/{pr-number}-<slug>.md — auto-generated BMAD review output (see “How this composes with BMAD” below). Same frontmatter family as session files; different folder.
status/daily/YYYY-MM-DD.md — human-authored rollup (already in use).
research/YYYY-MM-DD-topic.md — human-curated deep-dives (existing pattern).
decisions/NNNN-slug.md — MADR format, human-authored, one per architectural decision.

Not committed (gitignored):

Raw ~/.claude/projects/*.jsonl transcripts stay on disk during the session, then ship to the S3 archive (scrubbed) — never committed to git.
.claude/settings.local.json (personal overrides). The hand-written team/devs/{handle}.md profile is the committed counterpart — captures the why, leaks no secrets.
claude-mem local SQLite / Chroma stores (per-dev retrieval layer).

Archived (not git, not local-only):

s3://361-transcripts-archive/{handle}/{YYYY-MM}/{session-id}.jsonl — scrubbed JSONL transcripts. Standard → Glacier after 30 days.

Auto vs human:

Auto: per-session TL;DR .md (frontmatter + key-steps shape), per-session HTML render in status/sessions/html/, INDEX.md, BMAD review files in status/reviews/, entity back-links.
Human: per-dev profile in team/devs/{handle}.md, daily rollup, ADRs, research deep-dives, root CLAUDE.md and schema docs.

Privacy rules:

permissions.deny in committed settings blocks Claude from reading .env*, secrets/**, *.pem.
A PreToolUse hook runs a regex scrub on tool output before the model sees it; SessionEnd runs it again before writing to status/sessions/.
Gitleaks as pre-commit + CI guard (GitHub Actions).
<private>…</private> tag convention — anything wrapped is dropped from summaries.
Any secret accidentally committed → immediate key rotation. History stays (rotating is cheaper than rewriting history shared across 5 devs).

Rollout order:

Week 1 — ship the SessionEnd → markdown pipeline (~100 LOC bash + template) writing the locked-in frontmatter + key-steps shape into status/sessions/.
Week 1 — add daaain/claude-code-log in the SessionEnd runner; commits status/sessions/html/*.html next to each TL;DR for humans who want the whole chat.
Week 1 — stand up the S3 transcript archive (bucket, IAM role, lifecycle rule, scrub step). Wire the SessionEnd runner to upload post-scrub.
Week 1 — each dev fills in their team/devs/{handle}.md from TEMPLATE.md (async, not blocking).
Week 2 — adopt MADR for ADRs. Migrate the two existing research docs as ADRs or keep as research.
Week 2 — per-dev opt-in claude-mem for local retrieval (not required).
Phase 2 (following the BMAD planning-trio adoption in 2026-04-20-agentic-coding-and-pr-review.md) — wire the BMAD review agent to read status/sessions/*.md filtered by PR branch, write status/reviews/{pr}-<slug>.md, and trigger /new-adr on architectural finds.
Month 2+ — only if session logs start repeating themselves, stand up a gbrain-style compiled knowledge/ folder.

Everything committed stays in the repo, in markdown, in git — reviewable via PR, diffable, grep-searchable. Transcripts go to S3. No other external services on the critical path.

How this composes with BMAD

BMAD already logs what happens during code review. This session pipeline logs what happens during code authoring. Two ends of one workflow — they layer, not duplicate. Locking in the integration now, even though BMAD’s review agent ships in Phase 2:

Dev works in Claude Code → SessionEnd hook writes status/sessions/YYYY-MM-DD-<slug>-<handle>.md
Dev opens a PR on oxflow
BMAD review agent is triggered on the PR
Agent reads oxflow-docs/status/sessions/*.md filtered by branch: frontmatter matching the PR branch, plus any decisions/NNN-*.md files listed in the session’s Knowledge delta field
Agent writes oxflow-docs/status/reviews/{pr-number}-<slug>.md in the same frontmatter family as session files (author, pr_number, reviewer_agent, branch), with a sibling body: # TL;DR / # Concerns / # Decisions validated / # Follow-ups / # Knowledge delta
If the PR locks in something architectural, the review agent triggers /new-adr → decisions/NNN-*.md

The point of integration is step 4. BMAD consumes session intent instead of re-deriving it from the diff. Author intent is already recorded — there is no good reason to make the reviewer guess. Every PR ends up with a chain of what was built + why + how it was reviewed + what it locked in, all in the same repo, all readable by whatever agent starts the next session.

Shape family across the pipeline:

status/sessions/YYYY-MM-DD-<slug>-<handle>.md   ← author intent (auto)
status/reviews/{pr}-<slug>.md                    ← reviewer findings (auto, BMAD)
decisions/NNN-<slug>.md                          ← architectural lock-in (auto on trigger, human-edited)
team/devs/{handle}.md                            ← dev setup context (human)

All four are markdown with matching frontmatter schemas — an agent can walk the chain with one parser.

A wiki on top — what humans would actually see

The files above are organised to be machine-first: easy for Claude Code / BMAD / grep / an LLM link-walker to traverse. Humans can read them directly, but once there are a hundred recaps, a dozen reviews, and four epics in flight, a flat folder listing is not how any of us actually browse. A small wiki layered on the same files projects the repo into pages humans can read — without changing what the agents consume. It is a projection, not a new source of truth: delete the viewer tomorrow and the AI workflow is unaffected, because authoring stays in the repo.

Why this matters once the project gets bigger:

Volume. By month 3 there are hundreds of session recaps, dozens of reviews, a growing list of decisions, and the sprint’s epics and stories. A flat folder listing stops being easy to browse; a wiki puts the same files behind tidy landing pages.
Following the links. A folder is a list; real questions (“every chat that touched Epic 8”, “every decision Ibrahim authored”) are a graph. Machines answer those with one search; humans need pages that already have the hops done for them.
Onboarding. A new contractor opens the page for their buddy, reads two recent chats, skims the last decision — context in minutes, no handover meeting.
Stakeholder read-out. Project leads answer “what’s the team working on this sprint?” in one screen, built from the same recaps the team is already writing. No re-reading raw files, no slide-deck construction.

What the viewer reads. Nothing is authored inside the viewer — it is a read-only window on the same files:

Sprint planning artefacts (epics, stories, PRDs) — produced by the BMAD planning workflow, organised into one flat folder. Auto.
Chat recaps + full chat renders — the short recap is the default view; a “Full chat ↗” link opens the whole thing. Auto.
Review-bot notes — one file per PR, automatically cross-linked to the chat recap that preceded it. Auto.
Formal decision records — numbered ADRs; each page shows which chat led to it and which reviews confirmed it. Human.
Deep-dive research notes — this one and its siblings. Human.
Per-teammate “how I’m set up” pages — each person’s profile, layered with that person’s recent chats, reviews, and decisions. Human.
Formal spec docs — the product’s written specification, already rendered by oxflow-docs/app/app.js. Human.

For the engineers — what to add to the existing doc viewer:

The existing viewer at oxflow-docs/app/app.js already handles most of this. The new work is an extension, not a rewrite.

Capability	Already in `app.js`?	What to add
Markdown rendering (marked.js + DOMPurify)	Yes	No change
Collapsible sidebar with sections	Yes	No change
Hash routing (`#/slug`)	Yes	New route handlers: `#/epic/N`, `#/story/N.M`, `#/dev/{handle}`, `#/session/{slug}`, `#/review/{pr}`, `#/decision/NNN`. ~80 LOC total.
Cross-link rewriting + status dots	Yes	Extend rewriter to match `Epic 8`, `Story 8.2`, `ADR-003`.
`BRANDING.md` tokens + typography	Yes	No change.
Frontmatter parser	No	~30 LOC inline JS. Split `---` block, parse YAML-ish k/v. No new runtime dep.
Auto-registered docs	No	Build-time scan → `registry.json` from frontmatter across `status/`, `decisions/`, `research/`, `team/devs/`, BMAD `planning-artifacts/`. ~40 LOC.
Backlinks index	No	CI step scans for `Epic N` / `Story N.M` / `ADR-NNN` / session slugs across all markdown; emits `backlinks.json`. ~60 LOC.
Search	No	`lunr.js` build-time index over frontmatter + H1/H2/H3 + field filters (`author:`, `branch:`). ~20 LOC wiring.
Per-dev aggregation	No	Group by `author:` frontmatter; render profile + session/review/decision list. ~40 LOC.

Rollout — Phase 3. Build this once the recap files are landing (Phase 1) and the BMAD planning workflow has produced epics and stories for the wiki to show (Phase 2). No point building a viewer over empty folders.

Tradeoff worth flagging

team/devs/*.md and session TL;DRs both live in oxflow-docs alongside product documentation. At 4 devs, everything is discoverable together and that’s the right call — cheap, one repo to clone, one search surface. If the team grows past ~10, split team-ops out into its own repo so product docs aren’t buried in session logs. Easy migration when the day comes; not worth pre-empting.

Alternatives considered

Managed memory service (Mem0 / Zep / Letta / Cognee). Rejected for Phase 1 — at 3–5 devs the admin overhead and privacy exposure outweigh the retrieval benefit. Revisit if session-log volume exceeds what grep can handle (probably ~2k sessions, many months away).
Dedicated wiki tool (Obsidian / Logseq / Notion). Rejected because it fragments the source of truth away from the repo. The git-backed approach satisfies the same needs while staying reviewable and diff-able.
Raw transcript commits (.jsonl into the repo). Rejected because transcripts can contain tokens, are large, and aren’t human-readable. The locked-in TL;DR + key-steps file covers the human + agent use case; the committed HTML render covers “read the whole chat”; the S3 archive covers forensic recovery.
Managed transcript DB (separate Neon, or a table in oxFlow’s product Neon). Rejected for Phase 1 — no current query use-case justifies another DB, and coupling team-ops to product data is worse. S3 + a DuckDB-on-demand escape hatch wins on cost and simplicity.
Auto-generating team/devs/{handle}.md from each dev’s .claude/settings.json. Rejected because it leaks secrets and strips the why. Hand-written profiles are the point — a paragraph of context beats a config dump.
No capture at all — CLAUDE.md only. Rejected because knowledge currently lives in individual devs’ session histories and disappears when the session ends. The SessionEnd pipeline is the cheapest way to stop bleeding that context.

Sources

Claude Code docs & hooks

Transcript tooling

Memory systems

gbrain

Team / git-as-memory patterns

Linked from

Daily log 2026-04-17 — originating task.

oxFlow Wiki

Explorer