Persistent Memory for
Claude Agent SDK
Anthropic's Claude is outstanding at reasoning, tool use, and following complex instructions — but like every LLM API, it has no memory across sessions. This guide shows how to pair the Anthropic Python SDK with Kronvex to give your Claude agents a persistent, semantically searchable memory that survives across every conversation.
- Why Claude agents forget — and why CLAUDE.md isn't enough
- CLAUDE.md vs Kronvex: a practical comparison
- Installing Kronvex alongside the Anthropic SDK
- The basic pattern: recall before, store after
- Injecting memories into the Claude system prompt
- Full working agent with persistent memory
- Tool-augmented agents: memory as a Claude tool
- Async usage with AsyncAnthropic
- Best practices
Why Claude agents forget — and why CLAUDE.md isn't enough
The Anthropic API is stateless. Each client.messages.create() call is independent — there's no server-side session, no persistent thread, no memory of what Claude said yesterday. To maintain conversation context, you pass the full message history in every request. This works within a session but fails completely across sessions.
Many developers discover CLAUDE.md — a markdown file that Claude Code reads at startup to understand project context. While useful for developer tooling, CLAUDE.md has critical limitations as a memory system for production AI agents:
| Capability | CLAUDE.md | Kronvex |
|---|---|---|
| Per-user memories | No — shared file | Yes — session_id scoped |
| Dynamic updates | Manual file edits | API calls at runtime |
| Semantic search | Linear scan | pgvector cosine similarity |
| Scale | Grows context unboundedly | Retrieves top-K relevant |
| Multi-user | Not possible | Unlimited users |
| GDPR compliance | Depends on hosting | EU-hosted, deletable |
Kronvex is designed for production multi-user agents. It stores memories per user (via session_id), retrieves only the semantically relevant ones for each conversation, and scales to millions of memories without bloating Claude's context window.
CLAUDE.md vs Kronvex: a practical comparison
To be concrete: if you're building a personal assistant that will serve thousands of users, CLAUDE.md is the wrong tool. It's a static file that lives in a directory — it can't store memories per user, can't be updated at runtime, and can't retrieve relevant context semantically. Kronvex fills this gap.
The right mental model: CLAUDE.md is for static instructions ("always respond in French"), Kronvex is for dynamic, accumulated knowledge ("this user is a senior Python engineer building a logistics API who prefers async patterns").
Installing Kronvex alongside the Anthropic SDK
pip install anthropic kronvex
import anthropic from kronvex import Kronvex import os claude = anthropic.Anthropic( api_key=os.environ["ANTHROPIC_API_KEY"] ) kv = Kronvex( api_key=os.environ["KV_API_KEY"], agent_id=os.environ["KV_AGENT_ID"], )
The basic pattern: recall before, store after
The Kronvex integration wraps every Claude API call with two steps: recall relevant memories before the call, store the new exchange after. This two-step pattern is identical regardless of which features of the Anthropic SDK you use — basic completions, streaming, or tool use.
def chat_with_memory(user_message: str, user_id: str) -> str: # Step 1: Recall relevant memories context = kv.inject_context( message=user_message, session_id=user_id, ) # Step 2: Build system prompt with memory context system = "You are a helpful assistant." if context: system += f"\n\n{context}\n\nUse the above memories to personalise your response." # Step 3: Call Claude message = claude.messages.create( model="claude-sonnet-4-6", max_tokens=2048, system=system, messages=[{"role": "user", "content": user_message}], ) reply = message.content[0].text # Step 4: Store this exchange kv.remember( content=f"User: {user_message}", memory_type="episodic", session_id=user_id, ) kv.remember( content=f"Claude: {reply[:500]}", memory_type="episodic", session_id=user_id, ttl_days=90, ) return reply
Injecting memories into the Claude system prompt
Claude's API separates the system prompt from the messages array. This is the cleanest place to inject Kronvex memory context — it doesn't pollute the conversation history that Claude sees, and it doesn't consume the user's turn for memory retrieval.
The inject_context() method returns a pre-formatted string that you can append directly to your system prompt. It looks like this when populated:
## Relevant memories (most recent context):
- User: I'm building a FastAPI service for a logistics company (score: 0.89)
- User prefers PostgreSQL over NoSQL for this project (score: 0.77)
- Claude: Recommended using asyncpg for async DB connections (score: 0.71)
- User's team follows the Google Python Style Guide (score: 0.65)
When no memories are relevant (new user, or threshold not met), inject_context() returns an empty string, so your system prompt stays clean without any awkward "no memories found" text.
Full working agent with persistent memory
Here is a complete, production-ready Claude agent that remembers every user's context across sessions. It uses the Anthropic SDK's native system parameter and handles memory gracefully even when the Kronvex API is temporarily unavailable.
import os import anthropic from kronvex import Kronvex claude = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"]) kv = Kronvex( api_key=os.environ["KV_API_KEY"], agent_id=os.environ["KV_AGENT_ID"], ) BASE_SYSTEM = """You are a highly capable personal assistant with persistent memory. You remember past conversations and use that context to give more relevant, personalised answers. When you learn something important about the user — their role, preferences, ongoing projects, or technical constraints — you should acknowledge it naturally in conversation.""" def chat(user_message: str, user_id: str) -> str: # 1. Recall (fail silently) mem_context = "" try: mem_context = kv.inject_context( message=user_message, session_id=user_id, ) except Exception as e: print(f"Memory recall skipped: {e}") # 2. Build system prompt system = BASE_SYSTEM if mem_context: system += f"\n\n{mem_context}" # 3. Call Claude response = claude.messages.create( model="claude-sonnet-4-6", max_tokens=4096, system=system, messages=[{"role": "user", "content": user_message}], ) reply = response.content[0].text # 4. Store (fail silently, non-blocking) try: kv.remember( content=f"User: {user_message}", memory_type="episodic", session_id=user_id, ) kv.remember( content=f"Claude: {reply[:500]}", memory_type="episodic", session_id=user_id, ttl_days=90, ) except Exception as e: print(f"Memory storage failed: {e}") return reply if __name__ == "__main__": uid = "user-baptiste" # First session — tell Claude something important print(chat("I'm building a SaaS for freight logistics in Python with FastAPI", uid)) # Days later — Claude will remember your project print(chat("What was my project again? And what DB should I use?", uid))
Tool-augmented agents: memory as a Claude tool
Claude's tool use feature lets you define functions that Claude can call autonomously. Exposing Kronvex as tools gives Claude the agency to decide when and what to remember — particularly useful for research agents or complex multi-turn workflows.
MEMORY_TOOLS = [ { "name": "remember", "description": "Store an important fact or event in persistent memory for future recall.", "input_schema": { "type": "object", "properties": { "content": {"type": "string", "description": "The fact or event to store (1-2 sentences)"}, "memory_type": { "type": "string", "enum": ["episodic", "semantic", "procedural"], "description": "episodic=events, semantic=facts, procedural=patterns", }, }, "required": ["content"], }, }, { "name": "recall", "description": "Recall relevant memories from past conversations based on a query.", "input_schema": { "type": "object", "properties": { "query": {"type": "string", "description": "What to search for in memory"}, }, "required": ["query"], }, }, ] def handle_tool_call(tool_name: str, inputs: dict, session_id: str) -> str: if tool_name == "remember": kv.remember( content=inputs["content"], memory_type=inputs.get("memory_type", "episodic"), session_id=session_id, ) return "Memory stored successfully." elif tool_name == "recall": memories = kv.recall(query=inputs["query"], session_id=session_id) return "\n".join([m["content"] for m in memories]) or "No relevant memories found." return "Unknown tool."
Async usage with AsyncAnthropic
For production agents running inside async frameworks (FastAPI, asyncio-based services), use the async Anthropic client alongside the async Kronvex client. Both support async/await natively.
import asyncio import anthropic from kronvex.async_client import AsyncKronvex claude = anthropic.AsyncAnthropic() kv = AsyncKronvex( api_key=os.environ["KV_API_KEY"], agent_id=os.environ["KV_AGENT_ID"], ) async def async_chat(user_message: str, user_id: str) -> str: # Recall and other async setup in parallel context, _ = await asyncio.gather( kv.inject_context(message=user_message, session_id=user_id), asyncio.sleep(0), # replace with your own async setup return_exceptions=True, ) if isinstance(context, Exception): context = "" system = f"You are a helpful assistant.\n\n{context}".strip() response = await claude.messages.create( model="claude-sonnet-4-6", max_tokens=2048, system=system, messages=[{"role": "user", "content": user_message}], ) reply = response.content[0].text # Store both sides in parallel, non-blocking asyncio.create_task(kv.remember( content=f"User: {user_message}\nClaude: {reply[:500]}", memory_type="episodic", session_id=user_id, )) return reply
cache_control parameter. If your base system prompt is large and stable, mark it with {"type": "ephemeral"} to cache it and reduce costs. The dynamic Kronvex memory context goes after the cached portion — it changes per user, so it can't be cached, but your base instructions can be.
Best practices
Use semantic memories for user profiles
The most valuable long-term memories are facts about the user — their role, technical stack, project goals, preferences. Store these as memory_type="semantic" without a TTL. They will persist indefinitely and consistently surface when relevant.
Summarise before storing
Claude often produces detailed, multi-paragraph responses. Before storing them as memories, ask Claude to extract a 1–2 sentence summary. You can do this with a lightweight, cheap model call (Claude Haiku) or with a simple regex extraction of the first paragraph. Shorter memories embed more precisely and return higher confidence scores.
- Confidence threshold — use
threshold=0.40for tighter retrieval if irrelevant memories appear, or0.25if users complain about context loss - Session ID from auth — never trust client-supplied user IDs; derive from your JWT or session token server-side
- TTL for episodic memories — set
ttl_days=30to90on conversation memories; semantic user-profile memories can be permanent - EU data residency — all Kronvex memories are stored in Frankfurt; if your app serves EU users, this is the GDPR-compliant default
- Separate agents per environment — create distinct Kronvex agents for dev and production so test conversations never pollute real users' memory stores
Give your Claude agents a memory
Free plan — 1 agent, 100 memories. No credit card. Works with Claude Opus, Sonnet, Haiku — any Anthropic model.
Get your free API key →