The four layers of agent memory

Everyone treats agent memory as one framework, and compares the options as if they're rivals. They answer different questions — here's the map.

Say “agent memory” and you’ll get a dozen frameworks thrown back: working and episodic, MemGPT, Mem0, vector stores, knowledge graphs, context engineering — pitched as if you have to pick one. You don’t. They answer different questions, and most of the confusion comes from comparing across layers that were never rivals.

There are four layers, and the whole thing gets easier once you keep them straight:

Layer	The question it answers
Kinds	What kinds of memory exist
Management	How to manage a finite context budget
Storage	Where the memory physically lives
Product	Who already built it

What kinds — the cognitive taxonomy

Borrowed from how psychologists carve up human memory, and it transfers cleanly:

Working — the scratchpad for the current task; in practice, whatever is in the context window right now.
Episodic — what happened, and when. The layer that lets an agent say “we already tried that.”
Semantic — durable facts about the user and the world. This one is your knowledge base.
Procedural — how to do things; learned skills and routines.

Most “my agent has no memory” complaints are really a missing episodic layer — I’ve written before about why a bigger context window doesn’t fix it. And look at the third entry: a knowledge base is the semantic layer of memory. That’s why the two topics keep collapsing into each other.

How to manage it — the budget problem

Context is a finite attention budget, so the real job is curating what goes in. The framework here is context engineering, and it’s four moves: offload (push state out to files or a store), retrieve (pull back only what’s relevant), compress (summarise the old), isolate (give each sub-task only what it needs). MemGPT — now Letta — made this concrete with an operating-system metaphor: treat the context window as RAM and external storage as disk, page memory in and out, and let the agent edit its own memory.

Where it lives — storage

The same memory can sit in three kinds of store, and serious systems mix them:

Vector — embeddings and similarity search; the default.
Graph — a temporal knowledge graph (who relates to whom, and when); strong on relationships and how they change.
Files — Markdown in git; human-readable, auditable, zero lock-in. Anthropic’s own Claude memory is a folder of Markdown files.

Who built it — products

If you’d rather not assemble it yourself: Mem0 (multi-level, auto-extracted), Letta/MemGPT (the OS-style tiered memory), Zep (temporal knowledge graph), Anthropic’s Memory Files (file-based), and LangMem / Cognee in the LangChain orbit.

The honest version

None of these is a choice you make once. A real agent uses several kinds at once, managed by context engineering, sitting in some mix of stores, reached through a product or your own glue. The trap is treating “memory” as a single switch — and the tell that you’ve fallen for it is comparing a kind (episodic) against a store (vectors) against a product (Mem0) as if they were the same decision.

And the punchline I keep landing on: a knowledge base is just the semantic layer with the write turned off. Let the agent write back to it, and it stops being a knowledge base and starts being a memory.