agentic-aiartificial-intelligenceawssoftware-engineering

Hand-Rolling Agent Memory vs. AWS AgentCore: A Developer's Take

A practitioner's comparison of building agent memory yourself — conversation buffers, summarization, vector retrieval, consolidation — against handing it to AWS Bedrock AgentCore Memory. Where each one earns its keep, with code.

Hand-Rolling Agent Memory vs. AWS AgentCore: A Developer's Take

Every agent is stateless underneath. The model does not remember your last conversation; it remembers nothing. “Memory” is an engineering layer you build on top of a stateless inference call — a layer that decides what to carry forward into the next prompt, what to persist across sessions, and what to throw away.

For the first year of building agents, most of us hand-rolled that layer. It is not hard to start: a list of messages, a token budget, maybe a vector store bolted on for “long-term” recall. It gets hard later — at consolidation, at retrieval quality, at multi-session identity, at the operational tax of running it in production.

AWS Bedrock AgentCore Memory is Amazon’s bet that this layer should be a managed service, the same way you stopped running your own message queue. Having built memory both ways, here is the honest engineering comparison: what you actually write, what you actually own, and where each approach breaks.

The two halves of agent memory

Before comparing tools, get the mental model straight. Memory is two different problems that people conflate:

  • Short-term (working) memory — the context of the current session. The running transcript, scratchpad state, the last tool result. It needs to fit a token budget, so past a certain length you summarize or window it.
  • Long-term memory — facts that should survive across sessions. User preferences (“always deploys to eu-west-1”), durable decisions, a semantic record of what happened. This is not a transcript; it is extracted knowledge, retrieved by relevance, not by recency.

The hard part is never short-term memory. It is long-term: deciding what is worth keeping, extracting it cleanly, storing it so it is retrievable, and injecting the right three facts into a prompt instead of the wrong thirty. Keep that distinction in mind — it is exactly where the two approaches diverge.

Approach 1: Hand-rolling it

Short-term memory yourself

This part is genuinely easy and you should not over-think it. Short-term memory is a list and a budget.

class WorkingMemory:
    def __init__(self, max_tokens=6000, summarizer=None):
        self.messages = []
        self.max_tokens = max_tokens
        self.summarizer = summarizer  # an LLM call

    def add(self, role, content):
        self.messages.append({"role": role, "content": content})
        self._compact_if_needed()

    def _compact_if_needed(self):
        if count_tokens(self.messages) <= self.max_tokens:
            return
        # Summarize the oldest half, keep the recent tail verbatim
        head, tail = self.messages[:-6], self.messages[-6:]
        summary = self.summarizer(head)
        self.messages = [{"role": "system", "content": f"Earlier: {summary}"}, *tail]

That is a working windowed buffer with rolling summarization. It is maybe an afternoon of work, and for a single-session chatbot it is all you need. Do not reach for a managed service to solve this.

Long-term memory yourself

Here is where the afternoon turns into a quarter. A real DIY long-term memory has at least four moving parts:

  1. A store. Usually Postgres with pgvector, or a dedicated vector DB. You own the schema, the embedding model, the index, and the migrations.
  2. An extraction step. After (or during) a session, you run an LLM pass to pull out durable facts — “user prefers Terraform over CDK” — rather than dumping raw transcript. Skip this and your retrieval returns noisy chat instead of knowledge.
  3. A consolidation step. New facts contradict old ones. The user used to prefer CDK. You need logic to update, dedupe, and expire — or your memory slowly fills with stale, conflicting claims.
  4. Retrieval, scoped to the right person. Embed the query, search, filter by actor_id / tenant, and inject the top-k into the prompt. Get the scoping wrong and you leak one user’s memory into another’s session — a genuine security incident, not a bug.

A stripped-down version of the write-and-read path:

def remember(actor_id, transcript):
    # 1. Extract durable facts from the raw session
    facts = llm_extract_facts(transcript)        # returns ["prefers Terraform", ...]
    for fact in facts:
        emb = embed(fact)
        # 2. Consolidate: is this new, an update, or a duplicate?
        existing = db.nearest(actor_id, emb, threshold=0.9)
        if existing:
            db.update(existing.id, fact, emb)
        else:
            db.insert(actor_id=actor_id, text=fact, embedding=emb)

def recall(actor_id, query, k=5):
    emb = embed(query)
    rows = db.search(actor_id=actor_id, embedding=emb, limit=k)  # MUST filter actor_id
    return [r.text for r in rows]

None of this is exotic. But notice everything you now own in production: the embedding model version (re-embed everything when you upgrade it), the consolidation heuristics (the 0.9 threshold is a knob you will tune for weeks), the extraction prompt quality, the per-tenant isolation, the index performance as rows grow, and the cost of an extra LLM call on every session for extraction. This is a system, and it needs an owner.

What you get for that cost: total control. Your schema, your retrieval logic, your data, in your account, on your model of choice. For regulated workloads where the memory store holds PHI or PCI-scoped data, “it is my Postgres in my VPC with my audit logging” is not a nice-to-have — it is the difference between a compliant architecture and a finding.

Approach 2: AWS Bedrock AgentCore Memory

AgentCore Memory is a managed service that handles both halves. You stop running the store, the extraction pipeline, and the consolidation logic; you call an API.

The model is a clean split between events and memory records:

  • Short-term: you write raw conversational turns as events into a memory resource. They are scoped to a session and actor and are available immediately for the rest of that session.
  • Long-term: you attach one or more memory strategies to the resource. A strategy is a configuration that tells AgentCore how to process events into durable knowledge — extract semantic facts, capture user preferences, or roll up session summaries — asynchronously, without you writing the extraction or consolidation code.

You configure the strategies once on the control plane:

import boto3
cp = boto3.client("bedrock-agentcore-control")

memory = cp.create_memory(
    name="support_agent_memory",
    memoryStrategies=[
        {"semanticMemoryStrategy":      {"name": "facts"}},        # durable facts
        {"userPreferenceMemoryStrategy":{"name": "preferences"}},  # stable preferences
        {"summaryMemoryStrategy":       {"name": "session_recap"}},# per-session summary
    ],
)

Then, at runtime, the write path is one call — no extraction prompt, no embedding, no consolidation threshold to tune:

from bedrock_agentcore.memory import MemoryClient
mem = MemoryClient()

# Short-term: persist the turn. Long-term extraction happens automatically.
mem.create_event(
    memory_id=MEMORY_ID,
    actor_id="user-1234",
    session_id="session-abc",
    messages=[("I always deploy to eu-west-1", "USER")],
)

And the read path retrieves extracted knowledge, semantically, scoped by a namespace that encodes the actor and strategy:

records = mem.retrieve_memories(
    memory_id=MEMORY_ID,
    namespace="/strategies/preferences/actors/user-1234",
    query="which region does this user deploy to?",
    top_k=3,
)
# -> ["User prefers to deploy to eu-west-1"]

The extraction, the dedupe-against-old-preferences, the storage, and the index are gone from your codebase. That is the entire pitch: the four moving parts from Approach 1 collapse into create_event and retrieve_memories, and the consolidation that took you weeks to tune is a strategy name.

The honest tradeoffs

The marketing framing is “managed vs. DIY.” The real engineering decision is sharper than that.

What AgentCore genuinely removes: the long-term pipeline. Extraction quality, consolidation logic, embedding lifecycle, and the index are AWS’s problem now. If your team was about to spend a quarter building and tuning that, this is a real shortcut, and the namespace model for actor scoping is sensible.

What it costs you:

  • Control over retrieval. When recall returns the wrong facts — and it will sometimes — you tune a strategy and a query, not your own ranking. You are debugging a black box. With a hand-rolled pgvector setup, the failure is inspectable end to end.
  • Lock-in. Events, strategies, and namespaces are an AWS-shaped API. Your memory layer is now Bedrock-shaped. Migrating off is a rewrite, not a config change.
  • Data residency and compliance. Your conversational data and extracted facts live in an AWS managed store, processed by AWS-side extraction. For HIPAA or PCI work, that is a vendor and a data-flow you must diligence — BAA, region, encryption, retention, and what the extraction step sees. It can absolutely be done compliantly, but “managed” means you are now reasoning about someone else’s handling of regulated data, not just your own VPC.
  • Async semantics. Long-term extraction is not instant. A preference stated this turn may not be retrievable as a consolidated record for a short window. Your UX cannot assume read-your-writes on long-term memory.
  • Cost shape. You trade engineering time and your own infra bill for per-event and per-retrieval service pricing. At low volume that is a bargain; model the curve before you are at high volume.

How I’d actually decide

A few rules of thumb from building both:

  • Always hand-roll short-term memory. A windowed buffer with summarization is an afternoon. No managed service earns its keep here.
  • Reach for AgentCore when the long-term pipeline is the thing you don’t want to own — you want cross-session preferences and semantic recall, you are already on Bedrock, and your data is not so sensitive that an external managed store is a compliance fight.
  • Hand-roll long-term memory when control or data residency is non-negotiable — regulated data that must stay in your VPC, a non-AWS stack, or retrieval behavior you need to inspect and tune yourself. The quarter you spend buys you a system you fully own.
  • Either way, treat memory as a system with an owner. The failure mode that kills agents in production is not “no memory” — it is bad memory: stale facts, cross-tenant leakage, retrieval that surfaces the wrong context. A managed service moves where that risk lives; it does not delete it.

The honest read

Short-term memory is an afternoon — never outsource it. The real question is whether you want to own the long-term pipeline: extraction, consolidation, retrieval quality, and the embedding lifecycle. AgentCore is a legitimate shortcut past that quarter of work if you are already on Bedrock and your data isn’t fighting a compliance battle; hand-rolling it on pgvector is the right call when control and data residency are non-negotiable, which in regulated industries they usually are. What you cannot do is skip the design — memory is a system, and bad memory ships worse outcomes than none.

If you are designing an agent’s memory architecture — or trying to figure out why your agent keeps surfacing the wrong context in production — request a consultation.

Free Resources

Evaluating your AI or cloud readiness?

Download our free assessment tools — built for technology leaders in regulated industries.

AI Readiness Assessment Cloud Maturity Assessment
← Back to blog