LlamaIndex Persistent Memory:
Agents that remember across sessions
LlamaIndex is the go-to framework for RAG pipelines — but when it comes to agents, its built-in chat memory is in-memory only. Every time your process restarts, context is lost. Here's how to give LlamaIndex agents persistent, cross-session memory using Kronvex.
How LlamaIndex handles memory by default
LlamaIndex provides a SimpleChatStore and chat buffer classes like ChatMemoryBuffer to maintain conversation context within a single agent run. They work well for single-session use cases — but they are purely in-memory.
The moment your Python process ends, the agent's entire memory is gone. The next time a user opens the app, they start from zero. For any production use case, this is a fundamental problem:
- A customer service agent that can't remember a user's past issues
- A coding assistant that forgets your project's architecture between sessions
- A research agent that re-indexes documents it already processed last week
LlamaIndex's RAG pipeline is excellent at retrieving document content — but that's retrieval, not memory. Agent memory needs to persist episodic events, user preferences, and learned context across sessions. That's exactly what Kronvex is built for.
Setup — install Kronvex in 60 seconds
pip install llama-index kronvex
Get your API key from the Kronvex dashboard. Free plan available, no credit card needed. Create one agent — it will hold all memories for your LlamaIndex agent by default.
from kronvex import KronvexClient client = KronvexClient(api_key="kv-your-api-key") # Store a memory client.remember("my-agent", "User is building a Python FastAPI app") # Recall — semantic search result = client.recall("my-agent", "user project details") print(result.memories[0].content) # → "User is building a Python FastAPI app"
Storing memories from agent interactions
After each LlamaIndex agent run, extract meaningful information from the conversation and store it in Kronvex. Use memory_type="episodic" for events that happened, and memory_type="semantic" for facts and preferences that should persist long-term.
from kronvex import KronvexClient memory = KronvexClient(api_key="kv-your-api-key") AGENT_ID = "llamaindex-agent" def save_interaction(user_id: str, summary: str, memory_type: str = "episodic"): """Store a summary of an agent interaction.""" memory.remember( AGENT_ID, summary, user_id=user_id, memory_type=memory_type, ) # After a support interaction save_interaction( user_id="user_42", summary="User reported billing issue with Starter plan, resolved by upgrading to Pro", ) # Store a user preference (long-term semantic memory) save_interaction( user_id="user_42", summary="User prefers step-by-step explanations, dislikes jargon", memory_type="semantic", )
Recalling memories before each agent run
Before building the LlamaIndex agent, recall relevant memories and use them to prime the agent's context. This is the core pattern — retrieve what's known about the user and their history, then feed it to the LLM.
from kronvex import KronvexClient from llama_index.core.agent import ReActAgent from llama_index.llms.openai import OpenAI memory = KronvexClient(api_key="kv-your-api-key") AGENT_ID = "llamaindex-agent" def build_agent_with_memory(user_id: str, query: str): # Step 1: recall relevant memories for this user result = memory.recall(AGENT_ID, query, user_id=user_id, top_k=5) memory_context = "" if result.memories: memory_context = "Relevant past context:\n" + "\n".join([ f"- {m.content}" for m in result.memories ]) # Step 2: build agent with memory-enriched system prompt system_prompt = f"You are a helpful assistant.\n\n{memory_context}" llm = OpenAI(model="gpt-4o", system_prompt=system_prompt) return ReActAgent.from_tools([], llm=llm)
Using inject-context for automatic prompt injection
The inject_context endpoint does the recall and formatting for you — it returns a ready-to-use context_block string, formatted as a system prompt injection. This is the simplest way to add memory to any LLM call.
def build_agent_with_memory(user_id: str, query: str): # inject_context returns a formatted context block ready for the LLM ctx = memory.inject_context( AGENT_ID, query, user_id=user_id, top_k=6, ) llm = OpenAI(model="gpt-4o", system_prompt=ctx.context_block) agent = ReActAgent.from_tools([], llm=llm) return agent
Full working example
Here is a complete, production-ready pattern combining LlamaIndex agents with Kronvex persistent memory:
from llama_index.core.agent import ReActAgent from llama_index.llms.openai import OpenAI from kronvex import KronvexClient import os memory = KronvexClient(api_key=os.environ["KRONVEX_API_KEY"]) AGENT_ID = "llamaindex-agent" def build_agent_with_memory(user_id: str, query: str) -> ReActAgent: """Build a LlamaIndex agent pre-loaded with relevant memories.""" # Recall relevant memories and inject as system prompt ctx = memory.inject_context( AGENT_ID, query, user_id=user_id, ) llm = OpenAI(model="gpt-4o", system_prompt=ctx.context_block) agent = ReActAgent.from_tools([], llm=llm) return agent def save_interaction(user_id: str, summary: str): """Store a summary of the interaction to persistent memory.""" memory.remember(AGENT_ID, summary, user_id=user_id) # ─── Usage ───────────────────────────────────────────────────────── user_id = "user_42" question = "How do I add rate limiting to my FastAPI app?" # Build agent with memory from previous sessions agent = build_agent_with_memory(user_id, question) response = agent.chat(question) # Store this interaction for future sessions save_interaction( user_id, f"User asked: {question}. Provided SlowAPI-based solution.", ) print(response) # ─── Next session: agent remembers the user's FastAPI project ─────── agent2 = build_agent_with_memory(user_id, "database connection pooling") # LLM receives context: "User is working on a Python FastAPI app with SlowAPI rate limiting..." # → Response is automatically relevant to their actual project ✓
user_id to scope memories per user. Without it, memories are shared across all users of the same agent — which can cause context leakage in multi-tenant apps.
Production checklist
- Store your API key in an environment variable — never hardcode it
- Always pass
user_idin multi-tenant apps to scope memories per user - Use
memory_type="semantic"for stable preferences andmemory_type="episodic"for conversation events - Set
top_k=5to keep context size manageable and avoid prompt bloat - Add
ttl_days=30for ephemeral memories that should expire automatically - Monitor memory growth in the Kronvex dashboard
- All data is EU-hosted (Frankfurt) — GDPR compliant by default, no extra configuration needed
llama-index-core ≥ 0.10 with the new modular package structure. If you're on an older version (llama-index < 0.9), import paths may differ.
Start building in 5 minutes
Free plan, no credit card required. Your API key is available in the dashboard immediately after signing up.
Get my API key →