Why agents need memory, not just context

A bigger context window buys you a better goldfish. Persistence is a different problem — and most of the work happens between the prompts.

The easiest way to make an agent look smart is to give it a longer context window. Paste in the docs, the history, the tool outputs, and let the model sort it out. It works — right up until the conversation outgrows the window, or the cost makes every turn painful, or the model quietly forgets the one constraint that mattered three messages ago.

Context is what the model can see right now. Memory is what it chooses to carry forward. We keep conflating the two, and it costs us. A bigger window is a bigger desk; it is not a filing system. The interesting engineering — the part that makes an agent feel reliable instead of merely capable — happens in the gap between the prompts.

Three kinds of remembering

It helps to stop treating “memory” as one thing. In practice an agent needs at least three, and they fail in different ways:

Working memory — the scratchpad for the current task. Cheap to build, easy to overflow.
Episodic memory — what happened, and when. The thing that lets an agent say “we already tried that.”
Semantic memory — durable facts about the user and the world, distilled from many episodes.

Most “agent has no memory” complaints are really a missing episodic layer. The model can reason fine; it just has no record of its own past, so it loops, repeats, and re-apologizes.

An agent without episodic memory isn’t forgetful — it’s amnesiac. Every turn is its first day on the job.

Write less, decide more

The naïve move is to log everything and embed it. But retrieval quality collapses when the store is full of near-duplicates and dead ends. The real work is in what you choose to remember — a small, opinionated write step that runs after each task:

// after each completed task
const note = await summarize(episode, {
  keep: ["decisions", "failures", "user prefs"],
  drop: ["raw tool output"],
});
if (note.salience > 0.6) memory.write(note);

A salience threshold sounds crude, and it is — but a crude filter that runs every turn beats a perfect one that never ships. The store stays small, retrieval stays sharp, and the agent stops drowning in its own transcript. You can always raise the bar later; you can’t un-pollute a memory store after the fact.

The honest version

None of this makes the model smarter. It makes the system coherent — which, from the outside, is indistinguishable. A user doesn’t care whether continuity comes from a 200k window or a 2kb note. They care that the thing remembered. Build for that, and the rest is plumbing worth getting right.