the next sessions

Apr 24, 2026
#ai
#memory
#agents
#architecture

why ai memory is a compounding problem, not a retrieval problem. files beat databases.

every ai you use forgets you the moment you close the tab.

open a new conversation and the model has no idea who you are. no idea what you worked on yesterday. no idea that you told it three times to stop using em-dashes. you re-explain yourself, it rediscovers you, and the cycle repeats. every session starts from zero.

the standard response is to build a memory system. something that stores what the model learned and retrieves it when relevant. the industry has spent two years on this. vector databases, embedding pipelines, knowledge graphs, entity resolution, semantic search. the retrieval accuracy numbers are impressive. the products still forget you.

why retrieval is the wrong frame

the retrieval framing assumes the hard part is finding the right fact at the right time. it is not. the hard part is everything that happens between sessions.

retrieval is a scavenger hunt. every time your agent needs context, it searches your tools from scratch. no prior understanding, no accumulated knowledge, no sense of what has changed since the last time it looked. starting from zero, every single time. imagine hiring a new employee every morning, giving them full access to every system, and asking them to make decisions by lunch. they will find things. they will be confidently wrong about half of it.

as your history grows, the memory system must decide what to capture, how to represent it, and what to surface on any given turn. every one of those decisions is lossy, opinionated, and non-deterministic. over time, either the corpus gets too large to search reliably or what it remembers drifts from what was actually said.

there are only two ways to preserve information from a conversation. raw, which is original messages stored verbatim. derived, which is summaries, narratives, structured extractions. neither extreme works.

raw is lossless but inert. a pile of transcripts is not understanding. nothing is connected, prioritized, or interpreted. derived is compact and usable, but repeated derivation drifts from the source the way a photocopy of a photocopy degrades. you lose fidelity gradually and you cannot tell exactly when it stopped being accurate.

and no, infinite context does not solve this. two reasons. cost, because you would be paying to process your entire history on every single turn, and the economics scale linearly with history length. degradation, because models get measurably worse as context fills, with a 30% accuracy drop when relevant information sits in the middle of long sequences. you are paying more for worse performance.

the two camps

over 450 open-source memory repos were surveyed accordingly. they split into exactly two camps.

camp 1 solves fact recall. conversation goes in, facts come out, facts get stored, facts get retrieved on query. this is the standard vector-database pipeline. mem0 leads this camp at 53,000 stars. their latest algorithm achieves competitive accuracy at under 7,000 tokens per retrieval call while most systems need 25,000 or more. the trick is an add-only extraction pass that collapses two llm calls into one, entity linking that creates a separate lookup layer, and multi-signal retrieval that fuses semantic, keyword, and entity scoring in parallel. the engineering is genuinely impressive. mempalace hits 96.6% retrieval recall on longmemeval. these are serious systems solving the retrieval problem with real rigor.

camp 2 solves compounding context. the agent reads structured context at session start, works within it, writes back what it learned, and the next session starts richer. this is not a retrieve-on-query pattern. it is a read-on-start pattern. the context is always there. it compounds.

the distinction sounds subtle. in practice it changes everything. retrieval says, find the answer when someone asks the question. compounding says, maintain a continuously updated representation of reality so the answer already exists before anyone asks. retrieval gives you fragments. compounding gives you a worldview.

the most-starred projects of 2026 are camp 2. openclaw at 358,000 stars stores memory as plain markdown files with a dreaming process that runs overnight. light sleep, rem, deep sleep with weighted scoring. the files get pruned, consolidated, and strengthened while the user sleeps. gbrain, which garry tan open-sourced from his production system at 17,888 pages, uses the same architecture. git-backed markdown corpus, retrieval layer for search, skill files for behavior. his summary of the philosophy is five words. memory is markdown. brain is a git repo.

the clearest signal came from an unexpected direction. memsearch, built by zilliz, a vector database company, ships a memory tool that concedes markdown files are the source of truth. their own vector database is positioned as a "shadow index" derived from the files.

what compounding looks like

the mechanism that makes compounding work is not retrieval. it is consolidation.

human memory research identifies three types of long-term memory. episodic, which is specific past events. semantic, which is facts and concepts. procedural, which is skills and workflows. the transition between them is where memory becomes intelligence. an agent that notices "the user consistently prefers executive summaries" across dozens of interactions should turn that observation into a reusable rule. that is episodic data consolidating into semantic knowledge. the same way a doctor who has seen a thousand patients does not recall each individual case but carries an intuition built from accumulated pattern recognition.

the systems that compound best implement this transition explicitly. hermes by @NousResearch hit 50,000 stars in two months. the reason is the learning loop. every fifteen tool calls, the agent pauses. it reviews what worked, what failed, what took too long. then it writes a skill, a markdown file that turns what it learned into a reusable workflow. day one produces generic output. day thirty produces output formatted exactly the way you want it, following procedures the agent wrote for itself.

the difference from standard memory is fundamental. standard memory stores facts about your preferences. compounding memory stores executable procedures. it does not just remember that you like bullet points. it remembers the entire research-filter-format workflow that produces bullet points the way you want them.

@garrytan took this further with a practice he calls skillify. every time his agent makes a mistake, the failure gets promoted into a skill with a ten-step checklist. markdown contract, deterministic script, unit tests, integration tests, llm evals, resolver trigger, resolver eval, reachability audit, end-to-end smoke test, brain filing rules. a feature that does not pass all ten is not a skill, it is just code that happens to work today. the first time he ran a reachability audit on his forty skills, fifteen percent of them were dark. capable but unreachable because no trigger pointed at them. his summary of the philosophy is direct. the agent i work with a year from now will be shaped by every mistake it made in the year before. that is the whole thesis.

avid's agentic stack formalized the layering into four memory tiers. working memory for the current task. session memory that persists within a conversation. project memory that persists across sessions. permanent memory that spans everything. each layer has different retention rules and a nightly dream cycle that compresses observations from the lower layers into durable knowledge at the higher ones. the files get pruned, merged, and strengthened automatically. growth stays bounded. signal stays high.

the architecture of forgetting

remembering is easy. forgetting is structurally hard. and how a system forgets reveals whether its architecture is sound.

if you stored raw conversation turns and derived summaries from them, deleting the raw turns does not delete the summaries. if you extracted facts into a knowledge graph, deleting the source conversation leaves the facts orphaned with no provenance. real forgetting requires tracking every derivation back to its source so you can cascade deletes. almost no retrieval-based system does this.

compounding context handles forgetting differently. because the context is in files, you can version it. git gives you a full history of what was added, changed, and removed. when a fact becomes stale, you edit the file. the old version is in the commit history if you ever need it. no orphaned graph node, no untraceable derivation. the file is the source of truth and git is the provenance layer.

the lock-in you are not thinking about

there is a strategic dimension to memory that most people miss. harrison chase put it plainly. without memory, your agents are easily replicable by anyone who has access to the same tools. with memory, you build up a proprietary dataset. the competitive advantage is in accumulated context, not model access.

every proprietary agent system captures memory in formats only their system can use. encrypted compaction summaries that cannot be exported. proprietary session logs. managed memory apis. if your memory dies when your harness dies, you built the harness too thick.

the antidote is files on disk. markdown, version-controlled, readable by any agent. switch from claude code to cursor to codex to hermes and bring your entire brain with you. the memory is not in the tool. it is in the repo.

what i utilize

i built this pattern into my own daily workflow. an obsidian vault with over two hundred documents, synced to a postgres database with vector and keyword search across a thousand indexed pages. every new session starts by loading a single file that contains the rules, the project context, the cross-references, everything the agent needs to be useful from the first token.

when i tell the agent to stop using em-dashes, it writes to a memory file. next session, that file is loaded before anything happens. no query fires. no embedding is compared. the correction is simply there, in the context, from the start. when i research a topic, the findings go into wiki pages that link to each other and back to the raw sources they came from. when a source is wrong or outdated, i know which wiki pages depend on it because the links are explicit and bidirectional.

the raw sources live in one folder, immutable. the derived wiki pages live in another. the wiki owns interpretation. the raw folder owns truth. this is the same separation that makes git work. the working tree is the current state. the history is the provenance layer. both are files on disk.

the architecture follows a progressive disclosure pattern that keeps context tight. the agent does not load everything. it loads a fifty-line schema file first, which points to an index of thirty lines, which points to the two or three pages that matter for the current task. total context loaded per session is about a hundred to two hundred lines, not the full two hundred documents. the rest exists on disk, reachable but not loaded until the agent decides it needs it. this is the same principle that makes camp 2 work at scale. the cheap channel is what loads at start. the expensive channel is what gets read on demand.

the system already has a retrieval layer, but it works the way zilliz described. the files are the source of truth. the search is a shadow index. every time i push a commit, a sync daemon on a remote server pulls the changes within 60 seconds, diffs them against the existing corpus, computes vector embeddings for the modified pages, and writes the results to an on-prem supabase instance with postgres and pgvector. over a thousand pages are indexed this way. the search layer is useful when i need to find something across the full corpus that i cannot locate by folder structure alone. but the search layer does not own the data. it is derived from the files. if the database disappeared tomorrow, i would rebuild it from the repo in an afternoon. if the repo disappeared, everything would be gone. the files are the system. search is a convenience.

closing

the general shape is a mirror of a pattern i described in a previous article about multimodal agents. default to the cheap channel. use the expensive one only when the cheap one admits it cannot answer.

for video, the cheap channel is audio transcription. for memory, the cheap channel is files loaded at session start. the expensive channel for video is vision. the expensive channel for memory is retrieval, vector search, graph traversal, entity resolution. the architecture is the same at every layer. trust the cheap channel. let it tell you when it needs help.

the evidence for this keeps getting stronger. langchain tested the same model on the same benchmark with two different harnesses. the score moved from 52.8% to 66.5%. same weights, different environment. the harness is the memory system. the files are the harness. everything else is optimization on top.

a system that starts every session richer than the last one is not retrieving memories. it is compounding them.