I’ve been running multi-agent teams for months now. Agents writing code, generating images, managing deployments, drafting content, reviewing each other’s work. It’s the most productive workflow I’ve found — and the most fragile. Not because the models are bad or the tools are missing. The thing that keeps breaking is memory. How do agents remember what matters across conversations, across tools, and across each other? I’ve tried four different approaches. Each one solved one problem and created two more.

Multiple AI agents in a dark workspace, each surrounded by fragmented glowing memory shards that don't connect

The Problem I Didn’t Expect

When I started building multi-agent workflows, I obsessed over the usual things. Model selection, tool access, prompt engineering, workspace configuration. I wrote about most of those. But the problem that actually stalled me wasn’t any of them.

It was this: every morning, my agents woke up with amnesia.

Agent B would repeat a mistake that Agent A had already solved. An insight from Tuesday’s coding session evaporated by Thursday. I’d spend twenty minutes re-explaining a decision I’d already made — in a different conversation, with a different agent, on a different tool. The “team” wasn’t compounding knowledge. It was running in parallel with no shared learning curve.

I realized the ceiling on my multi-agent productivity wasn’t intelligence. It was memory. And I had no idea how hard that would be to fix.

I Tried Letting Agents Decide What to Remember

My first instinct was simple: give the agent a memory store and let it decide what’s worth saving. “If you learn something important, write it to memory. If you need context later, read from memory.” Sounded elegant.

What I got was a mess.

Some agents saved everything — pages of verbose summaries after every conversation. Others saved almost nothing. The ones that did save things wrote vague notes that lost all the critical details. I found one agent had memorized my font preferences but completely forgotten a database schema decision that took two hours to work out.

The deeper issue hit me a few weeks in: agents can’t predict what will matter later. Something that feels routine in one conversation becomes the linchpin of a different conversation three days from now. The agent had no way to know that when it decided what to save. I was getting noise masquerading as knowledge — a memory store full of confident-sounding notes that were either too vague to be useful or too specific to be relevant.

I tried tuning the prompts. “Save architectural decisions.” “Prioritize technical context.” It helped marginally, but the fundamental problem remained: self-directed memory is self-directed curation, and agents are bad curators.

I Tried Recency Memory

Next, I tried a simpler approach: just keep a rolling buffer of recent context. The last N conversations, recent tool outputs, the latest decisions. No curation needed — just a sliding window of “what happened recently.”

This was genuinely useful. For short chains of work — a feature that spans two or three conversations — recency memory kept the thread alive. The agent remembered what we’d just been doing and could pick up where we left off.

But I started noticing gaps. A bug fix from the previous week? Gone. An architectural decision from January? Evaporated. A pattern I’d noticed emerging across five separate conversations? Never aggregated, because no single conversation contained the whole picture.

Recency memory is a sieve. It captures what just happened at the cost of what happened three weeks ago. For single-session work, that’s fine. For multi-agent teams running across days and weeks — building on decisions that compound over time — it wasn’t enough. I kept re-explaining things that should have been institutional knowledge by now.

I Tried Shared Memory

The logical next step: if agents can’t remember individually, give them a shared pool. Every discovery, every decision, every preference — visible to all agents. Maximum information sharing. Collective intelligence.

I was excited about this one. It felt like the right answer.

It wasn’t.

What I experienced was something I started calling “squirrel brain.” My deployment agent would start reasoning about content strategy because it had access to the content agent’s memories. The code reviewer would reference design discussions it had no business considering. Agents would burn tokens chasing context from other agents’ domains — context that was technically available but completely irrelevant to their current task.

I watched my coding agent spend half its context window processing memories about blog post formatting preferences before it even got to the code change I’d asked for. The shared pool was so full of everyone’s knowledge that no individual agent could find its own signal in the noise.

This matched what I later found in the research — teams building multi-agent systems report “perspective inconsistency, forgetting key requirements, and procedural drift” when agents share undifferentiated memory. I wasn’t alone. The accumulated context was actively making my agents worse.

I Tried Partitioned Memory

So I went the other direction: strict separation. Each agent gets its own memory space. The code agent remembers code decisions. The deployment agent remembers infrastructure patterns. The content agent remembers editorial preferences. Clean boundaries. No cross-contamination.

This worked better than anything else. Each agent stayed focused. No squirrel brain. Clear domain expertise.

Until the agents needed to learn from each other.

The code agent refactored an API endpoint. The deployment agent — in a completely separate conversation, on a completely separate tool — had no idea the endpoint names had changed. It deployed a config pointing to URLs that no longer existed. The content agent referenced a feature that had been redesigned two conversations ago. Each agent was an expert in its own silo and completely blind to everything outside it.

I’d built an organization with no meetings, no email, and no shared documents. Every team member was brilliant and totally unaware of what the others were doing. If you’ve ever worked at a company with that problem, you know exactly how it feels — except agents don’t bump into each other at the coffee machine and accidentally share context.

Infographic: The Memory Bus Is You — a human wireframe figure at the center routing all knowledge between specialized agent nodes, because no agent memory architecture connects them directly

What I’ve Learned So Far

After cycling through all four approaches, I think I’ve at least identified what the actual problem is. It’s not storage. Vector databases, knowledge graphs, semantic search — the infrastructure for memory exists. The unsolved problem is curation and routing: what to remember, when to surface it, and who should see it.

What I actually need is an architecture that can:

Remember what matters without being told (because agents are bad curators)
Retain long-term insights without drowning in recency (because compounding requires history)
Share knowledge across agents selectively (because everything-to-everyone is squirrel brain)
Maintain agent specialization while enabling cross-pollination (because silos are blind)

I haven’t found anything that does all four. The research is pointing toward structured, hierarchical memory — episodic, semantic, and procedural layers with access controls. Some frameworks show promising benchmark results. But benchmarks aren’t production multi-agent teams running for weeks across messy, real-world projects.

Where I Am Now

Right now, I’m the router. I use partitioned memory — rules files, skills, workspace context — and when one agent discovers something another needs to know, I carry the message myself. I’m the memory bus. It works, but it doesn’t scale. The bottleneck in my multi-agent team is a human doing manual knowledge transfer.

The next experiment is structured communication protocols between agents. Not shared memory, but declared interfaces — explicit contracts for what knowledge crosses boundaries and when. Less like a shared brain, more like a well-designed API between services.

I don’t know if that’ll work either. But four failed approaches have given me a pretty clear map of the failure modes, and in my experience, that’s usually where the breakthrough starts.

If you’re building with multi-agent teams and hitting similar walls, I’d love to hear what you’ve tried. This one’s wide open.

Every Agent Memory Architecture Fails Differently