LIVE DEMO → Home Product
Features Use Cases Compare Enterprise
Docs
Documentation Quickstart MCP Server Integrations
Pricing Blog DASHBOARD → LOG IN →
Tutorial xAI · Grok Python May 4, 2026 · 8 min read

Persistent Memory for
Grok / xAI SDK

Grok is one of the fastest-evolving LLMs on the market — but like every LLM API, it has no memory across calls. Every conversation starts from zero. This guide shows how to pair the xAI API with Kronvex to give your Grok agents persistent, semantically searchable memory that survives across sessions.

In this article
  1. Why Grok agents forget between sessions
  2. The xAI OpenAI-compatible API
  3. Installing Kronvex and configuring credentials
  4. Step 1 — Storing memories after each turn
  5. Step 2 — Recalling relevant memories before each turn
  6. Full working example: stateful Grok agent
  7. Using inject-context for system prompt enrichment
  8. Best practices and production tips

Why Grok agents forget between sessions

xAI's Grok API works on a per-request basis. You send a list of messages, you get a completion back. The next API call is completely independent — there is no server-side session, no thread, no persistent context. If you want your agent to remember that a user prefers Python over JavaScript, or that they're building a SaaS in the logistics sector, you have to manage that state yourself.

The common workaround is to include the entire conversation history in every request. This works for short conversations, but it has three critical problems: it gets expensive fast (tokens add up), it hits context length limits, and it cannot retrieve information from past sessions — conversations from yesterday or last week are simply gone.

Kronvex solves this by acting as an external memory store. Instead of replaying raw conversation history, you store structured memories and retrieve only the semantically relevant ones — using pgvector cosine similarity on OpenAI's text-embedding-3-small embeddings — before each Grok API call.

EU data residency. All Kronvex memories are stored in EU-hosted PostgreSQL (Supabase Frankfurt). If your users are in Europe, this is the compliant choice. xAI's API is US-hosted, so only the memory layer is EU-bound.

The xAI OpenAI-compatible API

xAI designed the Grok API to be a drop-in replacement for the OpenAI API. This means you can use the official openai Python package to call Grok — just point it at xAI's base URL and use your xAI API key.

Install dependencies
pip install openai kronvex

The openai package handles the Grok API. The kronvex package handles memory — storing, recalling, and injecting context. Both use the same async-friendly interface.

Configure the xAI client
from openai import OpenAI

# Point the OpenAI client at xAI's base URL
grok_client = OpenAI(
    api_key="xai-YOUR_XAI_API_KEY",
    base_url="https://api.x.ai/v1",
)

# Available Grok models (May 2026)
# grok-3, grok-3-mini, grok-3-turbo, grok-2

Installing Kronvex and configuring credentials

Get a free API key from the Kronvex dashboard. No credit card required. The demo plan includes 1 agent and 100 memories — enough to prototype the full integration.

1
Sign up at kronvex.io/dashboard — your API key starts with kv-
2
Create an Agent in the dashboard. Copy its UUID — this is your AGENT_ID
3
Store both in your environment: KV_API_KEY and KV_AGENT_ID
Initialize Kronvex client
from kronvex import Kronvex
import os

kv = Kronvex(
    api_key=os.environ["KV_API_KEY"],
    agent_id=os.environ["KV_AGENT_ID"],
)

Step 1 — Storing memories after each turn

After every Grok response, extract the key information and store it as a memory. You have three memory types to choose from: episodic (what happened), semantic (facts about the user or domain), and procedural (recurring patterns and preferences).

Store a memory
# After Grok responds, store relevant information
response_text = completion.choices[0].message.content

# Store the user's message as an episodic memory
kv.remember(
    content=f"User said: {user_message}",
    memory_type="episodic",
    session_id=user_id,  # scope to this user
)

# Store key facts extracted from the conversation
kv.remember(
    content="User is building a logistics SaaS in Python, prefers async code",
    memory_type="semantic",
    session_id=user_id,
)
Tip: summarise before storing. For long Grok responses, ask Grok to extract a 1–2 sentence summary before storing. Shorter memories embed more precisely and retrieve with higher confidence scores.

Step 2 — Recalling relevant memories before each turn

Before sending a message to Grok, recall the most semantically relevant memories and inject them into the system prompt. Kronvex uses pgvector cosine similarity to find the best matches for the current message, then applies a composite confidence score: similarity × 0.6 + recency × 0.2 + frequency × 0.2.

Recall memories before a Grok call
# Recall the top 5 memories most relevant to the current message
memories = kv.recall(
    query=user_message,
    top_k=5,
    session_id=user_id,
)

# Format memories for the system prompt
memory_context = ""
if memories:
    memory_context = "## What I know about this user:\n"
    for m in memories:
        memory_context += f"- {m['content']}\n"

Full working example: stateful Grok agent

Here is a complete, self-contained example of a Grok agent with persistent memory. Each conversation turn recalls relevant past memories, calls Grok, and stores the new exchange.

grok_agent_with_memory.py
import os
from openai import OpenAI
from kronvex import Kronvex

# ── Clients ─────────────────────────────────────────────
grok = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
)

kv = Kronvex(
    api_key=os.environ["KV_API_KEY"],
    agent_id=os.environ["KV_AGENT_ID"],
)

def chat(user_message: str, user_id: str) -> str:
    # 1. Recall relevant past memories
    memories = kv.recall(
        query=user_message,
        top_k=6,
        session_id=user_id,
        threshold=0.35,
    )

    # 2. Build memory context
    mem_block = ""
    if memories:
        lines = [f"- {m['content']}" for m in memories]
        mem_block = "## Memory context:\n" + "\n".join(lines) + "\n\n"

    # 3. Call Grok with memory-enriched system prompt
    completion = grok.chat.completions.create(
        model="grok-3",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a helpful assistant with access to conversation history.\n"
                    + mem_block
                    + "Use the memory context above to personalise your response. "
                    + "If no context is relevant, respond normally."
                ),
            },
            {"role": "user", "content": user_message},
        ],
    )

    assistant_reply = completion.choices[0].message.content

    # 4. Store both sides of the conversation
    kv.remember(
        content=f"User: {user_message}",
        memory_type="episodic",
        session_id=user_id,
    )
    kv.remember(
        content=f"Grok: {assistant_reply}",
        memory_type="episodic",
        session_id=user_id,
    )

    return assistant_reply


# Example usage
if __name__ == "__main__":
    uid = "user-alice"
    # First session
    print(chat("I'm building a Python SDK for logistics APIs", uid))
    # Days later — Grok will remember the context
    print(chat("What was the project I was working on?", uid))

Using inject-context for system prompt enrichment

The inject_context endpoint is a convenience wrapper that handles recall and formatting in a single API call. It returns a pre-formatted string ready to drop directly into a system prompt — ideal for production code where you want minimal boilerplate.

inject_context — one-line memory injection
# Single call: recall + format → ready for system prompt
context = kv.inject_context(
    message=user_message,
    session_id=user_id,
)

completion = grok.chat.completions.create(
    model="grok-3",
    messages=[
        {
            "role": "system",
            "content": f"You are a helpful assistant.\n\n{context}",
        },
        {"role": "user", "content": user_message},
    ],
)
REST API alternative. If you prefer not to use the Python SDK, the Kronvex REST API at https://api.kronvex.io exposes the same operations. Send a POST /api/v1/agents/{agent_id}/inject-context with your X-API-Key header and a {"message": "...", "session_id": "..."} body.

Best practices and production tips

Choose the right model for cost vs. quality

Grok 3 is xAI's most capable model. For memory-heavy agents where you're injecting large context blocks, grok-3-mini or grok-3-turbo often give equivalent results at significantly lower cost. Run Kronvex recall first, then decide which Grok model to use based on how much context you retrieved.

Session ID design matters

Always use a stable, unique user identifier as the session_id. A UUID from your database, an email address, or a hashed phone number all work well. Without a session ID, all users share the same memory pool — memories from one user will surface in other users' recall results. This is both a correctness and a privacy issue.

Extract facts, don't store raw output

Before calling kv.remember(), consider asking Grok to extract key facts from the conversation and store those instead of raw message text. A second Grok call for extraction is cheap — and the resulting memories are far more precise and retrieval-friendly than multi-paragraph responses.

Give your Grok agents a memory

Free plan — 1 agent, 100 memories. No credit card. Add persistent memory to Grok in under 10 minutes.

Get your free API key →

Or explore all integrations in the docs

Integration guide
Grok SDK Persistent Memory — Setup Guide
Step-by-step setup · code snippets · 2 min
Read the guide →
Related articles
Free API Key
Get started free
No credit card required. 1 agent, 100 memories on the demo plan.