Building a Portable Knowledge Graph from AI Conversations
Table of Contents
I wanted Claude to remember what we talked about yesterday. That turned into building a portable knowledge graph in a single SQLite file.
The Problem
I use Claude Code for everything: building an AI runtime (Arachne, 46K lines of code), writing a book proposal, planning a conference, evaluating freelance projects, drafting LinkedIn posts. Each conversation is rich with decisions, architecture discussions, and context that evaporates the moment I start a new session.
Claude Code stores conversation transcripts as JSONL files, but a new session starts with zero memory of prior work. I was spending the first 10 minutes of every session re-explaining what we’d already decided. That’s not a workflow, that’s a tax.
The Idea
What if every new Claude instance could query a database of past conversations before answering? Not a summary. Not a CLAUDE.md file with bullet points. The actual conversations, searchable by meaning.
What I Built (in one session)
A session memory system in about 400 lines of Python that:
- Ingests Claude’s JSONL transcripts into SQLite
- Embeds every message using a local Ollama model (nomic-embed-text, 768 dimensions)
- Indexes for both keyword search (FTS5) and semantic search (sqlite-vec)
- Merges results with Reciprocal Rank Fusion
- Walks the conversation DAG (Claude’s messages form a directed acyclic graph because of tool call branching) to find the context that makes each hit meaningful
- Traverses semantic links across sessions and projects to surface related discussions
The last point is the one that surprised me. Once you have embeddings for every chunk, you can compute pairwise similarity across your entire conversation history and materialize those connections as graph edges. A discussion about Klarna’s AI failure in a YouTube video transcript automatically links to the pattern catalog entry about Semantic Invariants in my book project, which links to the MCP server spec discussion about agent contracts. The graph finds connections I didn’t know existed.
The Architecture
Three search dimensions:
Keyword (FTS5): Porter-stemmed full-text search. Fast, good for exact terms. When I search for “OpenSRS registrar,” I want the exact conversation where we evaluated domain registrars.
Semantic (sqlite-vec KNN): Vector similarity search over embeddings. When I search for “how should agents enforce correctness,” it finds conversations about Semantic Invariants, Agent Contracts, and Design by Contract even though none of those words appear in my query.
RRF Fusion: Reciprocal Rank Fusion merges both ranked lists. Chunks that score well on both keyword and semantic search get boosted. This consistently produces better results than either search alone.
Three context expansion strategies:
Window expansion: For each hit, include the surrounding messages (configurable, default ±2). The question that prompted an answer is often more informative than the answer itself.
Ancestor backtracking: Claude’s conversation transcripts include a parentUuid field that forms a DAG. Walking the parent chain backward from a hit surfaces the reasoning thread that led to that point.
Semantic link traversal: Cross-session, cross-project connections. This is the knowledge graph layer. After each ingest, a background job computes pairwise similarity between new chunks and the existing corpus, materializing high-similarity connections as edges in a semantic_links table.
Project Scoping
Each ingested session gets a project tag. My CLAUDE.md tells Claude to scope its memory queries to the current project:
python3 retrieve.py "your query here" --project arachne
A query from inside the Arachne project only searches Arachne conversations. A query from my universal chat searches everything. Any project can opt into global search with --global.
This maps naturally to the concept of partitions: hierarchical namespaces that scope resources. A project inherits from its parent, and a resource can be promoted upward to share across projects.
The Numbers
As of today:
- 25,004 chunks across 6 projects (43 sessions)
- 24,230 vector embeddings
- 52,222 semantic links (6,454 continuations, 19,945 references, 25,823 related)
- 215MB single SQLite file
- KNN search: ~70ms across 24K+ vectors
- Over 150 archived text conversations (2,400+ messages) staged for import
- Auto-ingest running every 15 minutes via cron
The entire knowledge graph is one file. Copy it to another machine and it just works. No server, no connection strings, no infrastructure.
The Discovery That Changed Everything
Midway through building this, I realized I was building Chapter 10 of my book.
I’m writing “Implementing DDD with LLMs” about applying Domain-Driven Design to AI systems. The book describes a runtime called Arachne with a formal type system for agentic systems: entities, roles, events, transactions, descriptors, partitions, principals, knowledge bases, and semantic memory.
The session memory system maps directly onto those primitives:
- Chunks are entities with identity and lifecycle (Ingested → Embedded → Linked)
- Project scoping is the partition model with resource inheritance
- The embedding + linking pipeline is a causal chain: you can’t link a chunk that hasn’t been embedded, you can’t embed a chunk that hasn’t been ingested
- The Prompt Decorator chain in Arachne’s hydration sequence is exactly what happens when retrieval assembles context from keyword hits + semantic matches + ancestor chains + cross-session links
- The semantic linker is an agent that subscribes to ConversationIngested events on the Service Bus
I built the dogfood implementation of my own spec without planning to. The book’s argument (the domain model has escaped the codebase and entered the runtime) demonstrated itself in the act of writing the tooling.
From Prototype to Product
The progression:
Level 0 (where I am now): Python scripts, SQLite, Ollama. Works on my laptop. No DSL, no compiler, no deployment. Pure runtime API.
Level 1: Declare the stable parts (chunk types, project scoping rules, link classification) in the Arachne DSL while keeping dynamic parts flexible.
Level 2: Fully woven. The compiler validates the causal chain (can’t link without embedding), the Role Transition Graph (chunk lifecycle), and the partition hierarchy (resource inheritance rules). Compile-time guarantees on a knowledge management system.
That progression is the Rigor Spectrum from the Arachne Runtime Spec: start where you are, add structure when the pain of not having it exceeds the cost of adding it. I started with a Python script. The spec wrote itself.
How We Built It: Collaboration, Not Dictation
This is the part I want to highlight, because it changed how I think about working with AI.
I didn’t spec this system and hand it to Claude to implement. We built it together, iteratively, in a single long-running session. I described the problem, Claude proposed an architecture, I pushed back on decisions, Claude adapted. The conversation was the design process.
The most instructive moment: importing conversations from Claude’s web chat interface. Claude’s first instinct was to write a Python script with regex patterns and heuristics to detect where user messages ended and assistant messages began. It built import_chat.py with pattern matching for timestamps, tool usage indicators, Q&A formatting, assistant sentence starters. It was clever code.
It was also wrong. The parser missed turn boundaries, merged messages that should have been separate, and misattributed roles. When I pointed this out, Claude’s second instinct was to iterate on the heuristics: add more patterns, handle more edge cases, tune the detection.
I stopped it and said: you parse the conversations. Don’t write a script to do it. You read the raw text, you identify each turn, you output structured JSON. I’ll import the JSON.
This is the insight: Claude is better at understanding conversation structure than any regex pattern could be. It can read a messy block of text and understand “this is the user asking a question, this is the assistant giving a detailed technical response, this is tool output, this is the user correcting the assistant.” That’s a comprehension task, not a pattern matching task. The heuristic approach was trying to programmatically solve something that requires understanding meaning.
So we split the work: Claude parses conversations into structured JSON (using its language understanding), and a simple import_parsed.py script handles the mechanical parts (inserting into SQLite, generating embeddings, computing links). The right tool for each job.
This pattern repeated throughout the session as we added support for other import formats, like video transcripts and chat logs. For each new format, we split the responsibilities: Claude’s semantic understanding to clean up, parse, and structure the raw conversation turns, and a simple local script to insert the structured results into SQLite, generate embeddings, and compute pairwise similarity links.
The collaboration model that emerged: I provide direction, constraints, and corrections. Claude provides implementation, exploration, and the grunt work of processing large volumes of text. Neither of us could have built this alone in a single session. The system is better because we argued about it.
What I’d Do Differently
Start with sqlite-vec from day one. I initially used brute-force cosine similarity in Python (load all embeddings, compute pairwise). It worked at 5K chunks but the semantic linker was grinding at 21K. sqlite-vec made KNN search ~100x faster with zero infrastructure change. Same SQLite file, just an extension.
Normalize the vectors. nomic-embed-text produces unnormalized embeddings, which means sqlite-vec’s L2 distance doesn’t map cleanly to cosine similarity. I had to compute actual cosine similarity for threshold checks, which adds overhead. Normalizing at embed time would have simplified everything.
Design the importer for multiple formats upfront. I built a JSONL importer for Claude Code transcripts, then needed a separate parser for Claude chat conversations (pasted as markdown), then needed a separate one for YouTube transcripts. A common intermediate format (the structured JSON that import_parsed.py consumes) would have saved time. In retrospect, “source-specific converters output standardized JSON, a single script imports it” should have been the architecture from the start. That’s where it ended up, but the journey had unnecessary detours through ad-hoc parsers before landing on a clean, decoupled pipeline.
The Stack
- Python 3.13
- SQLite (FTS5 for keyword search)
- sqlite-vec (KNN vector search)
- Ollama + nomic-embed-text (local embeddings, 768-dim)
- apsw (SQLite wrapper with extension support)
- Zero cloud dependencies
Everything runs locally. The embeddings are generated by Ollama on my machine. The entire knowledge graph lives in a file I control. No API keys, no cloud storage, no third-party access to my conversation history.
What’s Next
The session memory system is the foundation. What’s already been added since the initial build, and what’s coming next:
Done:
- Static archive importers: Decoupled importer scripts that parse local static archives of chat history, video transcripts, and code logs into the common schema.
- Auto-ingest pipeline: a cron job runs every 15 minutes, ingesting new sessions, generating embeddings, rebuilding the vector index, and computing semantic links. The knowledge graph stays current without manual intervention.
- Staging workflow: imported conversations land in a staging folder organized by source project, ready for review before committing to the graph. Prevents garbage from polluting your knowledge base.
- Conversation management spec: a full spec for a web UI (FastAPI + SQLite) that lets you browse, tag, search, and manage conversations across all sources. Multi-project tagging, staging review, bulk import.
Coming:
- Web UI for conversation management: browse the knowledge graph, review staged imports, assign projects and tags, visualize semantic connections
- Asset capture: images, documents, and code files generated in conversations, saved alongside the messages that created them
- Agent knowledge tools: agents that can write to the knowledge graph, not just read from it. The agent decides what’s worth remembering and calls
kb.store()orkb.promote(). - Cross-project semantic graph visualization: seeing the connections across all conversations in one view
- Book import: ingest entire books as chunked knowledge bases, linked to conversation context. The graph doesn’t care about the source format.
The portable knowledge graph pattern generalizes beyond AI conversations. Any corpus of text can be chunked, embedded, linked, and made queryable. The session memory system just happens to be the first implementation, built out of necessity, refined into architecture.
The code is at arachne-ai.com. The book is in development.
Michael Brown is the founder of Synaptic Weave, Inc., building Arachne, an AI agent runtime. He is currently writing “Implementing DDD with LLMs” about applying Domain-Driven Design to semantic systems.