OpenAI Agents SDK Memory:
How to Add Persistent State
The OpenAI Agents SDK is designed to be stateless — clean, composable, and easy to reason about. But stateless means your agent forgets everything the moment a session ends. Here's how to wire in persistent memory without fighting the framework.
Why the OpenAI Agents SDK is stateless by design
The OpenAI Agents SDK (released in early 2025) is built around a straightforward execution model: each Runner.run() call starts from scratch. You pass in a list of messages, the agent processes them, and it returns a response. There is no built-in mechanism to persist state between calls.
This is a deliberate design choice. Stateless agents are easier to test, easier to scale horizontally, and easier to reason about in a distributed environment. When OpenAI introduced the Agents SDK alongside the Responses API, they separated the concern of what the agent knows right now (the input messages) from what it has learned over time (which they left to the developer).
previous_response_id chain for a single conversation thread, but this only works within a session and doesn't survive process restarts or cross-user scenarios. It is threading, not memory.
What you lose without persistence
Without a memory layer, every session is a blank slate. The practical consequences compound quickly:
- Context window bloat — to approximate memory, developers stuff previous conversations into the system prompt. At ~$15/M tokens for GPT-4o, this adds up fast.
- Poor user experience — the agent asks the same onboarding questions on every session, forgets user preferences, and cannot reference past decisions.
- No learning curve — the agent cannot get better at serving a specific user over time, making it feel generic regardless of how many interactions have occurred.
- Broken workflows — multi-step processes that span days (e.g., a procurement agent managing an RFP) fall apart when state is lost between sessions.
Integration architecture
The pattern is straightforward: intercept the agent's inputs and outputs to read and write to Kronvex. The SDK's hook system makes this clean.
# ┌─────────────────────────────────────────────────────────────────┐
# │ User request │
# └────────────────────────────┬────────────────────────────────────┘
# │
# ┌─────────▼──────────┐
# │ recall() — fetch │ ← Kronvex (pgvector)
# │ top-k memories │
# └─────────┬──────────┘
# │ inject as system context
# ┌─────────▼──────────┐
# │ OpenAI Agent SDK │
# │ Runner.run() │
# └─────────┬──────────┘
# │
# ┌─────────▼──────────┐
# │ remember() — save │ ← Kronvex (pgvector)
# │ exchange + facts │
# └─────────┬──────────┘
# │
# ┌─────────▼──────────┐
# │ Response to user │
# └────────────────────┘
Full code example
The OpenAI Agents SDK exposes lifecycle hooks via RunHooks. We implement on_agent_start to inject context and on_agent_end to persist the exchange.
pip install openai-agents kronvex
from agents import RunHooks, RunContextWrapper, Agent, RunResult from kronvex import Kronvex import os kv = Kronvex(os.environ["KRONVEX_API_KEY"]) class PersistentMemoryHooks(RunHooks): def __init__(self, agent_id: str, session_id: str = None): self.kv_agent = kv.agent(agent_id) self.session_id = session_id self._last_input = None async def on_agent_start( self, context: RunContextWrapper, agent: Agent, ) -> None: # Grab the last user message as the recall query messages = context.context.get("messages", []) query = messages[-1]["content"] if messages else "" self._last_input = query # Fetch relevant memories ctx = self.kv_agent.inject_context( query=query, top_k=6, session_id=self.session_id, ) # Prepend memory block to agent's system instructions if ctx.context: agent.instructions = ( f"[MEMORY]\n{ctx.context}\n[/MEMORY]\n\n" + (agent.instructions or "") ) async def on_agent_end( self, context: RunContextWrapper, agent: Agent, result: RunResult, ) -> None: kwargs = dict(session_id=self.session_id) if self.session_id else {} # Store user input if self._last_input: self.kv_agent.remember( content=self._last_input, memory_type="episodic", **kwargs, ) # Store agent response output = result.final_output if output: self.kv_agent.remember( content=output, memory_type="episodic", **kwargs, )
from agents import Agent, Runner from memory_hooks import PersistentMemoryHooks agent = Agent( name="SupportAgent", instructions="You are a helpful customer support agent.", model="gpt-4o", ) hooks = PersistentMemoryHooks( agent_id="your-kronvex-agent-id", session_id="user-42", # scope per user ) # Session 1 result = await Runner.run( agent, messages=[{"role": "user", "content": "I'm on the Pro plan, invoice #4821."}], hooks=hooks, ) # Session 2 — new process, same user result2 = await Runner.run( agent, messages=[{"role": "user", "content": "What plan am I on?"}], hooks=hooks, ) # → "You're on the Pro plan. I can also see your invoice #4821."
Before / after comparison
The difference is visible at the first return session:
| Scenario | Without memory | With Kronvex |
|---|---|---|
| User returns after 3 days | Treated as new user | Context instantly restored |
| User preference ("no bullet points") | Lost after session | Recalled on every session |
| Multi-session workflow | Breaks without full history in prompt | Seamless continuation |
| 100 sessions in context | ~200k tokens / $3 per request | Top-6 relevant snippets only |
Production considerations
EU hosting and GDPR
Kronvex stores all memory vectors in an EU-hosted PostgreSQL instance (Supabase Frankfurt region). For B2B agents handling personal data, this matters: Article 46 transfer mechanisms, DPA availability, and data residency SLAs are all covered. No extra configuration needed.
Rate limits and quotas
The Kronvex API enforces per-plan limits on remember calls (writes) and recall calls (reads) separately. For high-throughput agents, batch your remember calls or use the async client to avoid blocking the response path.
Memory hygiene
- Use
memory_type="episodic"for conversation turns withttl_days=30to auto-expire stale exchanges - Use
memory_type="semantic"for durable user facts (plan, preferences, company) with no TTL - Use
memory_type="procedural"for agent-learned workflows and decision patterns - Set
top_k=6as a starting point — more memories means larger prompts and higher latency
Runner.run_streamed() or async tool calls, use the AsyncKronvex client to avoid blocking the event loop. Import it with from kronvex import AsyncKronvex.
Quick start
Three steps to add memory to any OpenAI agent:
# Sign up at https://kronvex.io — free plan, no credit card
# Create an agent in the dashboard → copy the agent ID
pip install openai-agents kronvex export KRONVEX_API_KEY="kv-your-key" export KRONVEX_AGENT_ID="your-agent-id"
from agents import Agent, Runner from kronvex import Kronvex from agents import RunHooks, RunContextWrapper, RunResult import os _kv = Kronvex(os.environ["KRONVEX_API_KEY"]) _kv_agent = _kv.agent(os.environ["KRONVEX_AGENT_ID"]) class MemoryHooks(RunHooks): def __init__(self, session_id: str): self.session_id = session_id self._q = None async def on_agent_start(self, ctx, agent): msgs = ctx.context.get("messages", []) self._q = msgs[-1]["content"] if msgs else "" mem = _kv_agent.inject_context(self._q, top_k=6, session_id=self.session_id) if mem.context: agent.instructions = f"[MEMORY]\n{mem.context}\n[/MEMORY]\n\n" + (agent.instructions or "") async def on_agent_end(self, ctx, agent, result): kw = dict(session_id=self.session_id) if self._q: _kv_agent.remember(self._q, memory_type="episodic", **kw) if result.final_output: _kv_agent.remember(result.final_output, memory_type="episodic", **kw) # Usage agent = Agent(name="MyAgent", instructions="You are a helpful assistant.", model="gpt-4o") result = await Runner.run(agent, messages=[...], hooks=MemoryHooks("user-42"))