Tutorial Claude Anthropic May 4, 2026 · 9 min read

Persistent Memory for
Claude Agent SDK

Anthropic's Claude is outstanding at reasoning, tool use, and following complex instructions — but like every LLM API, it has no memory across sessions. This guide shows how to pair the Anthropic Python SDK with Kronvex to give your Claude agents a persistent, semantically searchable memory that survives across every conversation.

In this article

Why Claude agents forget — and why CLAUDE.md isn't enough
CLAUDE.md vs Kronvex: a practical comparison
Installing Kronvex alongside the Anthropic SDK
The basic pattern: recall before, store after
Injecting memories into the Claude system prompt
Full working agent with persistent memory
Tool-augmented agents: memory as a Claude tool
Async usage with AsyncAnthropic
Best practices

Why Claude agents forget — and why CLAUDE.md isn't enough

The Anthropic API is stateless. Each client.messages.create() call is independent — there's no server-side session, no persistent thread, no memory of what Claude said yesterday. To maintain conversation context, you pass the full message history in every request. This works within a session but fails completely across sessions.

Many developers discover CLAUDE.md — a markdown file that Claude Code reads at startup to understand project context. While useful for developer tooling, CLAUDE.md has critical limitations as a memory system for production AI agents:

Capability	CLAUDE.md	Kronvex
Per-user memories	No — shared file	Yes — session_id scoped
Dynamic updates	Manual file edits	API calls at runtime
Semantic search	Linear scan	pgvector cosine similarity
Scale	Grows context unboundedly	Retrieves top-K relevant
Multi-user	Not possible	Unlimited users
GDPR compliance	Depends on hosting	EU-hosted, deletable

Kronvex is designed for production multi-user agents. It stores memories per user (via session_id), retrieves only the semantically relevant ones for each conversation, and scales to millions of memories without bloating Claude's context window.

CLAUDE.md vs Kronvex: a practical comparison

To be concrete: if you're building a personal assistant that will serve thousands of users, CLAUDE.md is the wrong tool. It's a static file that lives in a directory — it can't store memories per user, can't be updated at runtime, and can't retrieve relevant context semantically. Kronvex fills this gap.

The right mental model: CLAUDE.md is for static instructions ("always respond in French"), Kronvex is for dynamic, accumulated knowledge ("this user is a senior Python engineer building a logistics API who prefers async patterns").

Installing Kronvex alongside the Anthropic SDK

Install dependencies

pip install anthropic kronvex

Initialize both clients

import anthropic
from kronvex import Kronvex
import os

claude = anthropic.Anthropic(
    api_key=os.environ["ANTHROPIC_API_KEY"]
)

kv = Kronvex(
    api_key=os.environ["KV_API_KEY"],
    agent_id=os.environ["KV_AGENT_ID"],
)

Which Claude model? Kronvex works with all Anthropic models. For agents where memory context matters most, Claude Sonnet and Opus give the best instruction following. For high-throughput production agents, Claude Haiku with Kronvex memory recall is extremely cost-efficient.

The basic pattern: recall before, store after

The Kronvex integration wraps every Claude API call with two steps: recall relevant memories before the call, store the new exchange after. This two-step pattern is identical regardless of which features of the Anthropic SDK you use — basic completions, streaming, or tool use.

Basic pattern

def chat_with_memory(user_message: str, user_id: str) -> str:
    # Step 1: Recall relevant memories
    context = kv.inject_context(
        message=user_message,
        session_id=user_id,
    )

    # Step 2: Build system prompt with memory context
    system = "You are a helpful assistant."
    if context:
        system += f"\n\n{context}\n\nUse the above memories to personalise your response."

    # Step 3: Call Claude
    message = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system=system,
        messages=[{"role": "user", "content": user_message}],
    )
    reply = message.content[0].text

    # Step 4: Store this exchange
    kv.remember(
        content=f"User: {user_message}",
        memory_type="episodic",
        session_id=user_id,
    )
    kv.remember(
        content=f"Claude: {reply[:500]}",
        memory_type="episodic",
        session_id=user_id,
        ttl_days=90,
    )
    return reply

Injecting memories into the Claude system prompt

Claude's API separates the system prompt from the messages array. This is the cleanest place to inject Kronvex memory context — it doesn't pollute the conversation history that Claude sees, and it doesn't consume the user's turn for memory retrieval.

The inject_context() method returns a pre-formatted string that you can append directly to your system prompt. It looks like this when populated:

Example inject_context output

## Relevant memories (most recent context):
- User: I'm building a FastAPI service for a logistics company (score: 0.89)
- User prefers PostgreSQL over NoSQL for this project (score: 0.77)
- Claude: Recommended using asyncpg for async DB connections (score: 0.71)
- User's team follows the Google Python Style Guide (score: 0.65)

When no memories are relevant (new user, or threshold not met), inject_context() returns an empty string, so your system prompt stays clean without any awkward "no memories found" text.

Full working agent with persistent memory

Here is a complete, production-ready Claude agent that remembers every user's context across sessions. It uses the Anthropic SDK's native system parameter and handles memory gracefully even when the Kronvex API is temporarily unavailable.

claude_agent_with_memory.py

import os
import anthropic
from kronvex import Kronvex

claude = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
kv = Kronvex(
    api_key=os.environ["KV_API_KEY"],
    agent_id=os.environ["KV_AGENT_ID"],
)

BASE_SYSTEM = """You are a highly capable personal assistant with persistent memory.
You remember past conversations and use that context to give more relevant,
personalised answers. When you learn something important about the user —
their role, preferences, ongoing projects, or technical constraints — you
should acknowledge it naturally in conversation."""

def chat(user_message: str, user_id: str) -> str:
    # 1. Recall (fail silently)
    mem_context = ""
    try:
        mem_context = kv.inject_context(
            message=user_message,
            session_id=user_id,
        )
    except Exception as e:
        print(f"Memory recall skipped: {e}")

    # 2. Build system prompt
    system = BASE_SYSTEM
    if mem_context:
        system += f"\n\n{mem_context}"

    # 3. Call Claude
    response = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=4096,
        system=system,
        messages=[{"role": "user", "content": user_message}],
    )
    reply = response.content[0].text

    # 4. Store (fail silently, non-blocking)
    try:
        kv.remember(
            content=f"User: {user_message}",
            memory_type="episodic",
            session_id=user_id,
        )
        kv.remember(
            content=f"Claude: {reply[:500]}",
            memory_type="episodic",
            session_id=user_id,
            ttl_days=90,
        )
    except Exception as e:
        print(f"Memory storage failed: {e}")

    return reply


if __name__ == "__main__":
    uid = "user-baptiste"
    # First session — tell Claude something important
    print(chat("I'm building a SaaS for freight logistics in Python with FastAPI", uid))
    # Days later — Claude will remember your project
    print(chat("What was my project again? And what DB should I use?", uid))

Tool-augmented agents: memory as a Claude tool

Claude's tool use feature lets you define functions that Claude can call autonomously. Exposing Kronvex as tools gives Claude the agency to decide when and what to remember — particularly useful for research agents or complex multi-turn workflows.

Define Kronvex as Claude tools

MEMORY_TOOLS = [
    {
        "name": "remember",
        "description": "Store an important fact or event in persistent memory for future recall.",
        "input_schema": {
            "type": "object",
            "properties": {
                "content": {"type": "string", "description": "The fact or event to store (1-2 sentences)"},
                "memory_type": {
                    "type": "string",
                    "enum": ["episodic", "semantic", "procedural"],
                    "description": "episodic=events, semantic=facts, procedural=patterns",
                },
            },
            "required": ["content"],
        },
    },
    {
        "name": "recall",
        "description": "Recall relevant memories from past conversations based on a query.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "What to search for in memory"},
            },
            "required": ["query"],
        },
    },
]

def handle_tool_call(tool_name: str, inputs: dict, session_id: str) -> str:
    if tool_name == "remember":
        kv.remember(
            content=inputs["content"],
            memory_type=inputs.get("memory_type", "episodic"),
            session_id=session_id,
        )
        return "Memory stored successfully."
    elif tool_name == "recall":
        memories = kv.recall(query=inputs["query"], session_id=session_id)
        return "\n".join([m["content"] for m in memories]) or "No relevant memories found."
    return "Unknown tool."

Async usage with AsyncAnthropic

For production agents running inside async frameworks (FastAPI, asyncio-based services), use the async Anthropic client alongside the async Kronvex client. Both support async/await natively.

Async Claude + Kronvex

import asyncio
import anthropic
from kronvex.async_client import AsyncKronvex

claude = anthropic.AsyncAnthropic()
kv = AsyncKronvex(
    api_key=os.environ["KV_API_KEY"],
    agent_id=os.environ["KV_AGENT_ID"],
)

async def async_chat(user_message: str, user_id: str) -> str:
    # Recall and other async setup in parallel
    context, _ = await asyncio.gather(
        kv.inject_context(message=user_message, session_id=user_id),
        asyncio.sleep(0),  # replace with your own async setup
        return_exceptions=True,
    )
    if isinstance(context, Exception): context = ""

    system = f"You are a helpful assistant.\n\n{context}".strip()

    response = await claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system=system,
        messages=[{"role": "user", "content": user_message}],
    )
    reply = response.content[0].text

    # Store both sides in parallel, non-blocking
    asyncio.create_task(kv.remember(
        content=f"User: {user_message}\nClaude: {reply[:500]}",
        memory_type="episodic",
        session_id=user_id,
    ))
    return reply

Prompt caching + Kronvex. Anthropic supports prompt caching via the cache_control parameter. If your base system prompt is large and stable, mark it with {"type": "ephemeral"} to cache it and reduce costs. The dynamic Kronvex memory context goes after the cached portion — it changes per user, so it can't be cached, but your base instructions can be.

Best practices

Use semantic memories for user profiles

The most valuable long-term memories are facts about the user — their role, technical stack, project goals, preferences. Store these as memory_type="semantic" without a TTL. They will persist indefinitely and consistently surface when relevant.

Summarise before storing

Claude often produces detailed, multi-paragraph responses. Before storing them as memories, ask Claude to extract a 1–2 sentence summary. You can do this with a lightweight, cheap model call (Claude Haiku) or with a simple regex extraction of the first paragraph. Shorter memories embed more precisely and return higher confidence scores.

Confidence threshold — use threshold=0.40 for tighter retrieval if irrelevant memories appear, or 0.25 if users complain about context loss
Session ID from auth — never trust client-supplied user IDs; derive from your JWT or session token server-side
TTL for episodic memories — set ttl_days=30 to 90 on conversation memories; semantic user-profile memories can be permanent
EU data residency — all Kronvex memories are stored in Frankfurt; if your app serves EU users, this is the GDPR-compliant default
Separate agents per environment — create distinct Kronvex agents for dev and production so test conversations never pollute real users' memory stores

Give your Claude agents a memory

Free plan — 1 agent, 100 memories. No credit card. Works with Claude Opus, Sonnet, Haiku — any Anthropic model.

Get your free API key →

Or explore all integrations in the docs

Persistent Memory forClaude Agent SDK

Why Claude agents forget — and why CLAUDE.md isn't enough

CLAUDE.md vs Kronvex: a practical comparison

Installing Kronvex alongside the Anthropic SDK

The basic pattern: recall before, store after

Injecting memories into the Claude system prompt

Full working agent with persistent memory

Tool-augmented agents: memory as a Claude tool

Async usage with AsyncAnthropic

Best practices

Use semantic memories for user profiles

Summarise before storing

Give your Claude agents a memory

Persistent Memory for
Claude Agent SDK