Persistent Memory for
Grok / xAI SDK
Grok is one of the fastest-evolving LLMs on the market — but like every LLM API, it has no memory across calls. Every conversation starts from zero. This guide shows how to pair the xAI API with Kronvex to give your Grok agents persistent, semantically searchable memory that survives across sessions.
- Why Grok agents forget between sessions
- The xAI OpenAI-compatible API
- Installing Kronvex and configuring credentials
- Step 1 — Storing memories after each turn
- Step 2 — Recalling relevant memories before each turn
- Full working example: stateful Grok agent
- Using inject-context for system prompt enrichment
- Best practices and production tips
Why Grok agents forget between sessions
xAI's Grok API works on a per-request basis. You send a list of messages, you get a completion back. The next API call is completely independent — there is no server-side session, no thread, no persistent context. If you want your agent to remember that a user prefers Python over JavaScript, or that they're building a SaaS in the logistics sector, you have to manage that state yourself.
The common workaround is to include the entire conversation history in every request. This works for short conversations, but it has three critical problems: it gets expensive fast (tokens add up), it hits context length limits, and it cannot retrieve information from past sessions — conversations from yesterday or last week are simply gone.
Kronvex solves this by acting as an external memory store. Instead of replaying raw conversation history, you store structured memories and retrieve only the semantically relevant ones — using pgvector cosine similarity on OpenAI's text-embedding-3-small embeddings — before each Grok API call.
The xAI OpenAI-compatible API
xAI designed the Grok API to be a drop-in replacement for the OpenAI API. This means you can use the official openai Python package to call Grok — just point it at xAI's base URL and use your xAI API key.
pip install openai kronvex
The openai package handles the Grok API. The kronvex package handles memory — storing, recalling, and injecting context. Both use the same async-friendly interface.
from openai import OpenAI # Point the OpenAI client at xAI's base URL grok_client = OpenAI( api_key="xai-YOUR_XAI_API_KEY", base_url="https://api.x.ai/v1", ) # Available Grok models (May 2026) # grok-3, grok-3-mini, grok-3-turbo, grok-2
Installing Kronvex and configuring credentials
Get a free API key from the Kronvex dashboard. No credit card required. The demo plan includes 1 agent and 100 memories — enough to prototype the full integration.
kv-AGENT_IDKV_API_KEY and KV_AGENT_IDfrom kronvex import Kronvex import os kv = Kronvex( api_key=os.environ["KV_API_KEY"], agent_id=os.environ["KV_AGENT_ID"], )
Step 1 — Storing memories after each turn
After every Grok response, extract the key information and store it as a memory. You have three memory types to choose from: episodic (what happened), semantic (facts about the user or domain), and procedural (recurring patterns and preferences).
# After Grok responds, store relevant information response_text = completion.choices[0].message.content # Store the user's message as an episodic memory kv.remember( content=f"User said: {user_message}", memory_type="episodic", session_id=user_id, # scope to this user ) # Store key facts extracted from the conversation kv.remember( content="User is building a logistics SaaS in Python, prefers async code", memory_type="semantic", session_id=user_id, )
Step 2 — Recalling relevant memories before each turn
Before sending a message to Grok, recall the most semantically relevant memories and inject them into the system prompt. Kronvex uses pgvector cosine similarity to find the best matches for the current message, then applies a composite confidence score: similarity × 0.6 + recency × 0.2 + frequency × 0.2.
# Recall the top 5 memories most relevant to the current message memories = kv.recall( query=user_message, top_k=5, session_id=user_id, ) # Format memories for the system prompt memory_context = "" if memories: memory_context = "## What I know about this user:\n" for m in memories: memory_context += f"- {m['content']}\n"
Full working example: stateful Grok agent
Here is a complete, self-contained example of a Grok agent with persistent memory. Each conversation turn recalls relevant past memories, calls Grok, and stores the new exchange.
import os from openai import OpenAI from kronvex import Kronvex # ── Clients ───────────────────────────────────────────── grok = OpenAI( api_key=os.environ["XAI_API_KEY"], base_url="https://api.x.ai/v1", ) kv = Kronvex( api_key=os.environ["KV_API_KEY"], agent_id=os.environ["KV_AGENT_ID"], ) def chat(user_message: str, user_id: str) -> str: # 1. Recall relevant past memories memories = kv.recall( query=user_message, top_k=6, session_id=user_id, threshold=0.35, ) # 2. Build memory context mem_block = "" if memories: lines = [f"- {m['content']}" for m in memories] mem_block = "## Memory context:\n" + "\n".join(lines) + "\n\n" # 3. Call Grok with memory-enriched system prompt completion = grok.chat.completions.create( model="grok-3", messages=[ { "role": "system", "content": ( "You are a helpful assistant with access to conversation history.\n" + mem_block + "Use the memory context above to personalise your response. " + "If no context is relevant, respond normally." ), }, {"role": "user", "content": user_message}, ], ) assistant_reply = completion.choices[0].message.content # 4. Store both sides of the conversation kv.remember( content=f"User: {user_message}", memory_type="episodic", session_id=user_id, ) kv.remember( content=f"Grok: {assistant_reply}", memory_type="episodic", session_id=user_id, ) return assistant_reply # Example usage if __name__ == "__main__": uid = "user-alice" # First session print(chat("I'm building a Python SDK for logistics APIs", uid)) # Days later — Grok will remember the context print(chat("What was the project I was working on?", uid))
Using inject-context for system prompt enrichment
The inject_context endpoint is a convenience wrapper that handles recall and formatting in a single API call. It returns a pre-formatted string ready to drop directly into a system prompt — ideal for production code where you want minimal boilerplate.
# Single call: recall + format → ready for system prompt context = kv.inject_context( message=user_message, session_id=user_id, ) completion = grok.chat.completions.create( model="grok-3", messages=[ { "role": "system", "content": f"You are a helpful assistant.\n\n{context}", }, {"role": "user", "content": user_message}, ], )
https://api.kronvex.io exposes the same operations. Send a POST /api/v1/agents/{agent_id}/inject-context with your X-API-Key header and a {"message": "...", "session_id": "..."} body.
Best practices and production tips
Choose the right model for cost vs. quality
Grok 3 is xAI's most capable model. For memory-heavy agents where you're injecting large context blocks, grok-3-mini or grok-3-turbo often give equivalent results at significantly lower cost. Run Kronvex recall first, then decide which Grok model to use based on how much context you retrieved.
Session ID design matters
Always use a stable, unique user identifier as the session_id. A UUID from your database, an email address, or a hashed phone number all work well. Without a session ID, all users share the same memory pool — memories from one user will surface in other users' recall results. This is both a correctness and a privacy issue.
Extract facts, don't store raw output
Before calling kv.remember(), consider asking Grok to extract key facts from the conversation and store those instead of raw message text. A second Grok call for extraction is cheap — and the resulting memories are far more precise and retrieval-friendly than multi-paragraph responses.
- TTL for ephemeral context — set
ttl_days=30on episodic memories that don't need to persist forever - Use semantic memories for user profiles — store preferences, job role, and project context as
semantictype for highest retrieval reliability - Threshold tuning — a
threshold=0.35to0.5range works well for most cases; go lower if users complain about lack of context, higher if irrelevant memories appear - Async for production — the
kronvexPython package exposes an async client viafrom kronvex.async_client import AsyncKronvex— use it withasyncioto avoid blocking Grok API calls - Separate agents per environment — create distinct Kronvex agents for dev, staging, and production to prevent test data polluting real user memories
Give your Grok agents a memory
Free plan — 1 agent, 100 memories. No credit card. Add persistent memory to Grok in under 10 minutes.
Get your free API key →