LIVE DEMO → Home Product
Features Use Cases Compare Enterprise
Docs
Documentation Quickstart MCP Server Integrations Benchmark
Pricing Blog DASHBOARD → LOG IN →
Tutorial Python LlamaIndex March 29, 2026 · 9 min read

LlamaIndex Persistent Memory:
Agents that remember across sessions

LlamaIndex is the go-to framework for RAG pipelines — but when it comes to agents, its built-in chat memory is in-memory only. Every time your process restarts, context is lost. Here's how to give LlamaIndex agents persistent, cross-session memory using Kronvex.

In this article
  1. How LlamaIndex handles memory by default
  2. Setup — install Kronvex in 60 seconds
  3. Storing memories from agent interactions
  4. Recalling memories before each agent run
  5. Using inject-context for automatic prompt injection
  6. Full working example
  7. Production checklist

How LlamaIndex handles memory by default

LlamaIndex provides a SimpleChatStore and chat buffer classes like ChatMemoryBuffer to maintain conversation context within a single agent run. They work well for single-session use cases — but they are purely in-memory.

The moment your Python process ends, the agent's entire memory is gone. The next time a user opens the app, they start from zero. For any production use case, this is a fundamental problem:

LlamaIndex's RAG pipeline is excellent at retrieving document content — but that's retrieval, not memory. Agent memory needs to persist episodic events, user preferences, and learned context across sessions. That's exactly what Kronvex is built for.

RAG vs Agent Memory: RAG retrieves from a document corpus. Kronvex stores agent-specific memories — typed (episodic/semantic/procedural), scored by recency and access frequency, with per-user scoping. These two layers are complementary, not competing.

Setup — install Kronvex in 60 seconds

Install
pip install llama-index kronvex

Get your API key from the Kronvex dashboard. Free plan available, no credit card needed. Create one agent — it will hold all memories for your LlamaIndex agent by default.

Quick test
from kronvex import KronvexClient

client = KronvexClient(api_key="kv-your-api-key")

# Store a memory
client.remember("my-agent", "User is building a Python FastAPI app")

# Recall — semantic search
result = client.recall("my-agent", "user project details")
print(result.memories[0].content)
# → "User is building a Python FastAPI app"

Storing memories from agent interactions

After each LlamaIndex agent run, extract meaningful information from the conversation and store it in Kronvex. Use memory_type="episodic" for events that happened, and memory_type="semantic" for facts and preferences that should persist long-term.

Storing after a run
from kronvex import KronvexClient

memory = KronvexClient(api_key="kv-your-api-key")
AGENT_ID = "llamaindex-agent"

def save_interaction(user_id: str, summary: str, memory_type: str = "episodic"):
    """Store a summary of an agent interaction."""
    memory.remember(
        AGENT_ID,
        summary,
        user_id=user_id,
        memory_type=memory_type,
    )

# After a support interaction
save_interaction(
    user_id="user_42",
    summary="User reported billing issue with Starter plan, resolved by upgrading to Pro",
)

# Store a user preference (long-term semantic memory)
save_interaction(
    user_id="user_42",
    summary="User prefers step-by-step explanations, dislikes jargon",
    memory_type="semantic",
)

Recalling memories before each agent run

Before building the LlamaIndex agent, recall relevant memories and use them to prime the agent's context. This is the core pattern — retrieve what's known about the user and their history, then feed it to the LLM.

Recall pattern
from kronvex import KronvexClient
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

memory = KronvexClient(api_key="kv-your-api-key")
AGENT_ID = "llamaindex-agent"

def build_agent_with_memory(user_id: str, query: str):
    # Step 1: recall relevant memories for this user
    result = memory.recall(AGENT_ID, query, user_id=user_id, top_k=5)

    memory_context = ""
    if result.memories:
        memory_context = "Relevant past context:\n" + "\n".join([
            f"- {m.content}" for m in result.memories
        ])

    # Step 2: build agent with memory-enriched system prompt
    system_prompt = f"You are a helpful assistant.\n\n{memory_context}"
    llm = OpenAI(model="gpt-4o", system_prompt=system_prompt)
    return ReActAgent.from_tools([], llm=llm)

Using inject-context for automatic prompt injection

The inject_context endpoint does the recall and formatting for you — it returns a ready-to-use context_block string, formatted as a system prompt injection. This is the simplest way to add memory to any LLM call.

inject_context — simpler pattern
def build_agent_with_memory(user_id: str, query: str):
    # inject_context returns a formatted context block ready for the LLM
    ctx = memory.inject_context(
        AGENT_ID,
        query,
        user_id=user_id,
        top_k=6,
    )

    llm = OpenAI(model="gpt-4o", system_prompt=ctx.context_block)
    agent = ReActAgent.from_tools([], llm=llm)
    return agent

Full working example

Here is a complete, production-ready pattern combining LlamaIndex agents with Kronvex persistent memory:

agent_with_memory.py
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from kronvex import KronvexClient
import os

memory = KronvexClient(api_key=os.environ["KRONVEX_API_KEY"])
AGENT_ID = "llamaindex-agent"

def build_agent_with_memory(user_id: str, query: str) -> ReActAgent:
    """Build a LlamaIndex agent pre-loaded with relevant memories."""
    # Recall relevant memories and inject as system prompt
    ctx = memory.inject_context(
        AGENT_ID,
        query,
        user_id=user_id,
    )

    llm = OpenAI(model="gpt-4o", system_prompt=ctx.context_block)
    agent = ReActAgent.from_tools([], llm=llm)
    return agent

def save_interaction(user_id: str, summary: str):
    """Store a summary of the interaction to persistent memory."""
    memory.remember(AGENT_ID, summary, user_id=user_id)

# ─── Usage ─────────────────────────────────────────────────────────
user_id = "user_42"
question = "How do I add rate limiting to my FastAPI app?"

# Build agent with memory from previous sessions
agent = build_agent_with_memory(user_id, question)
response = agent.chat(question)

# Store this interaction for future sessions
save_interaction(
    user_id,
    f"User asked: {question}. Provided SlowAPI-based solution.",
)

print(response)

# ─── Next session: agent remembers the user's FastAPI project ───────
agent2 = build_agent_with_memory(user_id, "database connection pooling")
# LLM receives context: "User is working on a Python FastAPI app with SlowAPI rate limiting..."
# → Response is automatically relevant to their actual project ✓
Multi-user tip: Always pass user_id to scope memories per user. Without it, memories are shared across all users of the same agent — which can cause context leakage in multi-tenant apps.

Production checklist

LlamaIndex version note: This tutorial uses llama-index-core ≥ 0.10 with the new modular package structure. If you're on an older version (llama-index < 0.9), import paths may differ.

Start building in 5 minutes

Free plan, no credit card required. Your API key is available in the dashboard immediately after signing up.

Get my API key →

Or read the full integration docs

Related articles
Free access
Get your API key

100 free memories. No credit card required.

Already have an account? Sign in →