LIVE DEMO → Home Product
FeaturesUse CasesCompare
Docs
DocumentationQuickstartIntegrations
PricingBlog DASHBOARD → LOG IN →
Tutorial Claude Anthropic May 4, 2026 · 9 min read

Persistent Memory for
Claude Agent SDK

Anthropic's Claude is outstanding at reasoning, tool use, and following complex instructions — but like every LLM API, it has no memory across sessions. This guide shows how to pair the Anthropic Python SDK with Kronvex to give your Claude agents a persistent, semantically searchable memory that survives across every conversation.

In this article
  1. Why Claude agents forget — and why CLAUDE.md isn't enough
  2. CLAUDE.md vs Kronvex: a practical comparison
  3. Installing Kronvex alongside the Anthropic SDK
  4. The basic pattern: recall before, store after
  5. Injecting memories into the Claude system prompt
  6. Full working agent with persistent memory
  7. Tool-augmented agents: memory as a Claude tool
  8. Async usage with AsyncAnthropic
  9. Best practices

Why Claude agents forget — and why CLAUDE.md isn't enough

The Anthropic API is stateless. Each client.messages.create() call is independent — there's no server-side session, no persistent thread, no memory of what Claude said yesterday. To maintain conversation context, you pass the full message history in every request. This works within a session but fails completely across sessions.

Many developers discover CLAUDE.md — a markdown file that Claude Code reads at startup to understand project context. While useful for developer tooling, CLAUDE.md has critical limitations as a memory system for production AI agents:

CapabilityCLAUDE.mdKronvex
Per-user memoriesNo — shared fileYes — session_id scoped
Dynamic updatesManual file editsAPI calls at runtime
Semantic searchLinear scanpgvector cosine similarity
ScaleGrows context unboundedlyRetrieves top-K relevant
Multi-userNot possibleUnlimited users
GDPR complianceDepends on hostingEU-hosted, deletable

Kronvex is designed for production multi-user agents. It stores memories per user (via session_id), retrieves only the semantically relevant ones for each conversation, and scales to millions of memories without bloating Claude's context window.

CLAUDE.md vs Kronvex: a practical comparison

To be concrete: if you're building a personal assistant that will serve thousands of users, CLAUDE.md is the wrong tool. It's a static file that lives in a directory — it can't store memories per user, can't be updated at runtime, and can't retrieve relevant context semantically. Kronvex fills this gap.

The right mental model: CLAUDE.md is for static instructions ("always respond in French"), Kronvex is for dynamic, accumulated knowledge ("this user is a senior Python engineer building a logistics API who prefers async patterns").

Installing Kronvex alongside the Anthropic SDK

Install dependencies
pip install anthropic kronvex
Initialize both clients
import anthropic
from kronvex import Kronvex
import os

claude = anthropic.Anthropic(
    api_key=os.environ["ANTHROPIC_API_KEY"]
)

kv = Kronvex(
    api_key=os.environ["KV_API_KEY"],
    agent_id=os.environ["KV_AGENT_ID"],
)
Which Claude model? Kronvex works with all Anthropic models. For agents where memory context matters most, Claude Sonnet and Opus give the best instruction following. For high-throughput production agents, Claude Haiku with Kronvex memory recall is extremely cost-efficient.

The basic pattern: recall before, store after

The Kronvex integration wraps every Claude API call with two steps: recall relevant memories before the call, store the new exchange after. This two-step pattern is identical regardless of which features of the Anthropic SDK you use — basic completions, streaming, or tool use.

Basic pattern
def chat_with_memory(user_message: str, user_id: str) -> str:
    # Step 1: Recall relevant memories
    context = kv.inject_context(
        message=user_message,
        session_id=user_id,
    )

    # Step 2: Build system prompt with memory context
    system = "You are a helpful assistant."
    if context:
        system += f"\n\n{context}\n\nUse the above memories to personalise your response."

    # Step 3: Call Claude
    message = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system=system,
        messages=[{"role": "user", "content": user_message}],
    )
    reply = message.content[0].text

    # Step 4: Store this exchange
    kv.remember(
        content=f"User: {user_message}",
        memory_type="episodic",
        session_id=user_id,
    )
    kv.remember(
        content=f"Claude: {reply[:500]}",
        memory_type="episodic",
        session_id=user_id,
        ttl_days=90,
    )
    return reply

Injecting memories into the Claude system prompt

Claude's API separates the system prompt from the messages array. This is the cleanest place to inject Kronvex memory context — it doesn't pollute the conversation history that Claude sees, and it doesn't consume the user's turn for memory retrieval.

The inject_context() method returns a pre-formatted string that you can append directly to your system prompt. It looks like this when populated:

Example inject_context output
## Relevant memories (most recent context):
- User: I'm building a FastAPI service for a logistics company (score: 0.89)
- User prefers PostgreSQL over NoSQL for this project (score: 0.77)
- Claude: Recommended using asyncpg for async DB connections (score: 0.71)
- User's team follows the Google Python Style Guide (score: 0.65)

When no memories are relevant (new user, or threshold not met), inject_context() returns an empty string, so your system prompt stays clean without any awkward "no memories found" text.

Full working agent with persistent memory

Here is a complete, production-ready Claude agent that remembers every user's context across sessions. It uses the Anthropic SDK's native system parameter and handles memory gracefully even when the Kronvex API is temporarily unavailable.

claude_agent_with_memory.py
import os
import anthropic
from kronvex import Kronvex

claude = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
kv = Kronvex(
    api_key=os.environ["KV_API_KEY"],
    agent_id=os.environ["KV_AGENT_ID"],
)

BASE_SYSTEM = """You are a highly capable personal assistant with persistent memory.
You remember past conversations and use that context to give more relevant,
personalised answers. When you learn something important about the user —
their role, preferences, ongoing projects, or technical constraints — you
should acknowledge it naturally in conversation."""

def chat(user_message: str, user_id: str) -> str:
    # 1. Recall (fail silently)
    mem_context = ""
    try:
        mem_context = kv.inject_context(
            message=user_message,
            session_id=user_id,
        )
    except Exception as e:
        print(f"Memory recall skipped: {e}")

    # 2. Build system prompt
    system = BASE_SYSTEM
    if mem_context:
        system += f"\n\n{mem_context}"

    # 3. Call Claude
    response = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=4096,
        system=system,
        messages=[{"role": "user", "content": user_message}],
    )
    reply = response.content[0].text

    # 4. Store (fail silently, non-blocking)
    try:
        kv.remember(
            content=f"User: {user_message}",
            memory_type="episodic",
            session_id=user_id,
        )
        kv.remember(
            content=f"Claude: {reply[:500]}",
            memory_type="episodic",
            session_id=user_id,
            ttl_days=90,
        )
    except Exception as e:
        print(f"Memory storage failed: {e}")

    return reply


if __name__ == "__main__":
    uid = "user-baptiste"
    # First session — tell Claude something important
    print(chat("I'm building a SaaS for freight logistics in Python with FastAPI", uid))
    # Days later — Claude will remember your project
    print(chat("What was my project again? And what DB should I use?", uid))

Tool-augmented agents: memory as a Claude tool

Claude's tool use feature lets you define functions that Claude can call autonomously. Exposing Kronvex as tools gives Claude the agency to decide when and what to remember — particularly useful for research agents or complex multi-turn workflows.

Define Kronvex as Claude tools
MEMORY_TOOLS = [
    {
        "name": "remember",
        "description": "Store an important fact or event in persistent memory for future recall.",
        "input_schema": {
            "type": "object",
            "properties": {
                "content": {"type": "string", "description": "The fact or event to store (1-2 sentences)"},
                "memory_type": {
                    "type": "string",
                    "enum": ["episodic", "semantic", "procedural"],
                    "description": "episodic=events, semantic=facts, procedural=patterns",
                },
            },
            "required": ["content"],
        },
    },
    {
        "name": "recall",
        "description": "Recall relevant memories from past conversations based on a query.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "What to search for in memory"},
            },
            "required": ["query"],
        },
    },
]

def handle_tool_call(tool_name: str, inputs: dict, session_id: str) -> str:
    if tool_name == "remember":
        kv.remember(
            content=inputs["content"],
            memory_type=inputs.get("memory_type", "episodic"),
            session_id=session_id,
        )
        return "Memory stored successfully."
    elif tool_name == "recall":
        memories = kv.recall(query=inputs["query"], session_id=session_id)
        return "\n".join([m["content"] for m in memories]) or "No relevant memories found."
    return "Unknown tool."

Async usage with AsyncAnthropic

For production agents running inside async frameworks (FastAPI, asyncio-based services), use the async Anthropic client alongside the async Kronvex client. Both support async/await natively.

Async Claude + Kronvex
import asyncio
import anthropic
from kronvex.async_client import AsyncKronvex

claude = anthropic.AsyncAnthropic()
kv = AsyncKronvex(
    api_key=os.environ["KV_API_KEY"],
    agent_id=os.environ["KV_AGENT_ID"],
)

async def async_chat(user_message: str, user_id: str) -> str:
    # Recall and other async setup in parallel
    context, _ = await asyncio.gather(
        kv.inject_context(message=user_message, session_id=user_id),
        asyncio.sleep(0),  # replace with your own async setup
        return_exceptions=True,
    )
    if isinstance(context, Exception): context = ""

    system = f"You are a helpful assistant.\n\n{context}".strip()

    response = await claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system=system,
        messages=[{"role": "user", "content": user_message}],
    )
    reply = response.content[0].text

    # Store both sides in parallel, non-blocking
    asyncio.create_task(kv.remember(
        content=f"User: {user_message}\nClaude: {reply[:500]}",
        memory_type="episodic",
        session_id=user_id,
    ))
    return reply
Prompt caching + Kronvex. Anthropic supports prompt caching via the cache_control parameter. If your base system prompt is large and stable, mark it with {"type": "ephemeral"} to cache it and reduce costs. The dynamic Kronvex memory context goes after the cached portion — it changes per user, so it can't be cached, but your base instructions can be.

Best practices

Use semantic memories for user profiles

The most valuable long-term memories are facts about the user — their role, technical stack, project goals, preferences. Store these as memory_type="semantic" without a TTL. They will persist indefinitely and consistently surface when relevant.

Summarise before storing

Claude often produces detailed, multi-paragraph responses. Before storing them as memories, ask Claude to extract a 1–2 sentence summary. You can do this with a lightweight, cheap model call (Claude Haiku) or with a simple regex extraction of the first paragraph. Shorter memories embed more precisely and return higher confidence scores.

Give your Claude agents a memory

Free plan — 1 agent, 100 memories. No credit card. Works with Claude Opus, Sonnet, Haiku — any Anthropic model.

Get your free API key →

Or explore all integrations in the docs

Integration guide
Claude Agent SDK Persistent Memory — Setup Guide
Step-by-step setup · code snippets · 2 min
Read the guide →
Related articles
Free API Key
Get started free
No credit card required. 1 agent, 100 memories on the demo plan.