Infrastructure

Pattern 12 of 26

Memory Patterns

Giving agents a past and a future

By default, agents start each session with no idea what happened last time. Memory patterns are how you fix that. They give the agent a way to store and retrieve past interactions, learned preferences, and accumulated facts. The tricky part is not storing things. It is deciding what to keep, what to discard, and whether you can actually retrieve the right thing when you need it.

Why it matters

Without memory, every conversation is a first conversation. That is fine for simple tasks and completely wrong for anything spanning multiple sessions. Memory is also where most teams underestimate the engineering work. Storing is easy. Retrieval that actually works is not.

Deep Dive

By default, agents are stateless. Every session begins fresh, no knowledge of prior work, no recollection of what the user prefers, no awareness that this exact question was asked last Tuesday. Memory patterns address this by giving the agent access to information stored outside the context window. There are four types worth distinguishing: in-context memory (what is in the current prompt right now), external memory (retrieved from a database or vector store), episodic memory (records of specific past events or sessions), and procedural memory (skills, preferences, and learned behaviors the agent has accumulated). Each type has a different cost structure, a different latency profile, and a different failure mode.

MemGPT, introduced in 2023 and now called Letta, took an unusual angle on this. Rather than just adding a retrieval layer, the authors applied operating system memory management concepts directly to LLMs. The model has a main context, which functions like RAM, and explicit memory management functions it can call to page content in and out of external storage, which functions like disk. The model knows when its context is getting full and takes deliberate action to make room. The paper demonstrated this enabled multi-session task continuity that simpler approaches could not handle, though the overhead of managing memory explicitly is real and not always worth it for shorter tasks.

Mem0 is the production-oriented approach. It builds an adaptive memory layer that automatically extracts entities, preferences, and facts from conversations without requiring the model to manage memory explicitly. Those extracted pieces get stored with vector embeddings and retrieved in future sessions based on semantic similarity. Their benchmarks report 91% lower latency compared to naive full-history approaches. The practical lesson from working with any of these systems is the same: storage is not the constraint. Retrieval is. A memory system that retrieves the wrong thing with confidence makes an agent worse, not better. You end up with an agent that is certain about incorrect context.

In the Wild

Mem0 (91% lower latency)
Letta (formerly MemGPT)
Zep
Cloudflare Durable Objects

Go Deeper

PAPERMemGPT: Towards LLMs as Operating SystemsPAPERMem0: Building Production-Ready AI Agents with Scalable Long-Term MemoryARTICLEMemory for AgentsARTICLEContext Engineering: LLM Memory and Retrieval for AI AgentsARTICLEAgent Memory: How to Build Agents that Learn and Remember

Related Patterns