TL;DR: LangChain's default memory classes (ConversationBufferMemory, VectorStoreRetrieverMemory with FAISS, etc.) are in-process data structures. They disappear on restart, break under multi-worker deployments, and have no user isolation. For EU production deployments, you need an external persistent backend. This guide compares three options: self-hosted pgvector, Zep community edition, and Kronvex.
LangChain memory options: a quick overview
LangChain ships several memory classes. Understanding what each one actually does — and where it breaks down — is the starting point for choosing a production solution.
ConversationBufferMemory
The simplest option: stores the full conversation history in-process as a list of messages. Works for demos and single-session scripts. Fails the moment:
- Your process restarts (memory gone)
- You scale to multiple workers (each has its own copy)
- Conversations get long (entire history gets re-injected every turn, burning tokens)
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory()
# This is a dict in RAM. Nothing is persisted anywhere.
ConversationSummaryMemory
Summarizes conversation history using an LLM to keep the context window manageable. Better for long conversations, but still in-process and still not persistent.
ConversationBufferWindowMemory
Keeps only the last N turns in memory. Reduces token usage, but old context is lost entirely — and it's still not persistent.
VectorStoreRetrieverMemory
Uses a vector store backend to retrieve semantically relevant past messages rather than the whole history. This is the right architecture for production, but the default implementations use in-memory or local-file vector stores that aren't suitable for multi-process deployments.
from langchain.memory import VectorStoreRetrieverMemory
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
# FAISS is in-memory — loses data on restart
vectorstore = FAISS.from_texts([""], embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever(search_kwargs=dict(k=5))
memory = VectorStoreRetrieverMemory(retriever=retriever)
# Still in-process. Still not production-ready.
LangMem (LangChain's experimental memory layer)
LangMem is an experimental project from the LangChain team that provides more sophisticated memory management, including background memory consolidation. It's US-hosted as a managed service (LangSmith infrastructure). As of April 2026, it's in early access and not recommended for regulated production use. No announced EU hosting.
The problem with default LangChain memory in production
The core issue is simple: none of LangChain's default memory classes persist to a durable store by default. They are in-process data structures.
Production requirements that break default memory:
Multi-worker deployments. If you run two Gunicorn workers or two Railway replicas, each has a separate memory object. A user who hits worker A has different "memory" from a user who hits worker B. This is a correctness bug in production.
Process restarts. Every deploy, crash, or scale-down event wipes in-process memory. Users start every conversation from scratch.
Long-running agents. As conversation history grows, re-injecting the full buffer grows the context window linearly. A six-month customer support history is not injectable.
Multi-user applications. In-process memory has no natural user isolation. One memory object per process, not per user.
The solution in all cases is the same: externalize memory to a persistent store with an API. The question for EU teams is which persistent store is compliant, operationally feasible, and affordable.
EU-hosted persistent memory options for LangChain
Option 1: Self-hosted pgvector on EU infrastructure
pgvector is a PostgreSQL extension that adds vector similarity search. You can run it on a EU-hosted PostgreSQL instance and build your own memory layer on top.
Advantages: Full control, no external dependency, cheapest at scale if you're already running PostgreSQL.
Disadvantages: You build and maintain the memory layer yourself — embedding, similarity search, confidence scoring, user isolation, GDPR endpoints. Typically 2–4 weeks of engineering work to get right, plus ongoing maintenance.
# DIY pgvector memory — simplified
import psycopg2
from openai import OpenAI
client = OpenAI()
def remember(user_id: str, content: str, conn):
embedding = client.embeddings.create(
input=content,
model="text-embedding-3-small"
).data[0].embedding
with conn.cursor() as cur:
cur.execute(
"INSERT INTO memories (user_id, content, embedding) VALUES (%s, %s, %s)",
(user_id, content, embedding)
)
conn.commit()
def recall(user_id: str, query: str, conn, top_k: int = 5):
query_embedding = client.embeddings.create(
input=query,
model="text-embedding-3-small"
).data[0].embedding
with conn.cursor() as cur:
cur.execute(
"""
SELECT content, 1 - (embedding <=> %s::vector) AS similarity
FROM memories
WHERE user_id = %s
ORDER BY similarity DESC
LIMIT %s
""",
(query_embedding, user_id, top_k)
)
return cur.fetchall()
# You also need: user isolation, GDPR endpoints, confidence scoring, API layer...
Who this is right for: Teams with existing PostgreSQL infrastructure, strong data sovereignty requirements, and engineering capacity to build and maintain the memory layer.
Option 2: Zep community edition (self-hosted)
Zep is an open-source memory layer with a knowledge graph approach. It can be self-hosted on EU infrastructure.
Advantages: Richer than raw pgvector — entity extraction, relationship graphs, dialog classification. Official LangChain integration via ZepChatMessageHistory.
Disadvantages: Operationally complex to run (depends on Neo4j or similar), non-deterministic LLM-based extraction, requires infrastructure ownership, and the multi-user isolation model requires careful implementation.
from langchain_community.memory import ZepChatMessageHistory
from langchain.memory import ConversationBufferMemory
# Requires your own Zep instance running on EU infra
history = ZepChatMessageHistory(
session_id="user-123-session-456",
url="https://your-zep.your-eu-domain.com",
api_key="your-zep-key",
)
memory = ConversationBufferMemory(
chat_memory=history,
return_messages=True,
)
Who this is right for: Teams that need complex entity-relationship graph memory and have infrastructure capacity to run Zep.
Option 3: Kronvex (EU-managed API)
Kronvex is a managed EU-hosted memory API. No infrastructure to run — you call remember(), recall(), and inject-context(). Runs on Supabase Frankfurt.
Advantages: No ops overhead, EU-native, GDPR endpoints built in, deterministic and auditable, predictable flat pricing.
Disadvantages: Not self-hostable. Less powerful than Zep's knowledge graph for entity-relationship use cases. Requires explicit remember() calls (no auto-extraction).
from kronvex import KronvexClient
kv = KronvexClient(api_key="kv-your-key", base_url="https://api.kronvex.io")
Full code example: LangChain + Kronvex in production
This is a production-ready pattern for a multi-user LangChain agent with persistent, EU-hosted memory.
from langchain.memory import BaseChatMemory
from langchain.chains import ConversationChain
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from kronvex import KronvexClient
from typing import Any
kv = KronvexClient(api_key="kv-your-key", base_url="https://api.kronvex.io")
AGENT_ID = "production-agent"
class KronvexMemory(BaseChatMemory):
"""
Production LangChain memory backed by Kronvex.
- Persists across process restarts and deploys
- Isolated per user_id
- GDPR-compliant (EU-hosted, Art. 17/20 endpoints available)
"""
kv_client: Any
agent_id: str
user_id: str
top_k: int = 5
memory_key: str = "history"
min_confidence: float = 0.4 # filter out low-confidence recalls
class Config:
arbitrary_types_allowed = True
@property
def memory_variables(self) -> list[str]:
return [self.memory_key]
def load_memory_variables(self, inputs: dict) -> dict:
query = inputs.get("input", "")
if not query:
return {self.memory_key: "No prior context."}
memories = self.kv_client.recall(
agent_id=self.agent_id,
query=query,
user_id=self.user_id,
top_k=self.top_k,
)
# Filter by confidence threshold
relevant = [m for m in memories if m.get("confidence", 0) >= self.min_confidence]
if not relevant:
return {self.memory_key: "No relevant prior context found."}
lines = [f"- {m['content']} [confidence: {m['confidence']:.2f}]" for m in relevant]
return {self.memory_key: "\n".join(lines)}
def save_context(self, inputs: dict, outputs: dict) -> None:
human_input = inputs.get("input", "")
ai_output = outputs.get("response", "")
# Store what matters — not raw turns, but meaningful statements
if human_input and len(human_input) > 20: # skip trivial inputs
self.kv_client.remember(
agent_id=self.agent_id,
content=human_input,
user_id=self.user_id,
)
def clear(self) -> None:
pass # Use the GDPR erasure endpoint for intentional deletion
def create_agent_for_user(user_id: str) -> ConversationChain:
memory = KronvexMemory(
kv_client=kv,
agent_id=AGENT_ID,
user_id=user_id,
)
prompt = PromptTemplate(
input_variables=["history", "input"],
template=(
"You are a helpful assistant. "
"Use the context from past sessions to give consistent, personalized responses.\n\n"
"Past session context:\n{history}\n\n"
"Human: {input}\n"
"Assistant:"
),
)
return ConversationChain(
llm=ChatOpenAI(model="gpt-4o-mini"),
memory=memory,
prompt=prompt,
verbose=False,
)
# Usage — each user gets their own isolated memory
agent_alice = create_agent_for_user("user-alice")
agent_bob = create_agent_for_user("user-bob")
# Session 1 — alice
agent_alice.predict(input="I'm building a fintech app in Python using FastAPI.")
# Later — different process, same user_id
agent_alice_new_session = create_agent_for_user("user-alice")
response = agent_alice_new_session.predict(
input="What framework should I use for the API layer?"
)
# Agent recalls alice's FastAPI + fintech context
print(response)
Key design decisions in the example above: The min_confidence threshold (0.4) prevents low-signal noise from polluting the context. The save_context method skips inputs shorter than 20 characters to avoid storing trivial turns like "ok" or "yes". The clear() method is intentionally a no-op — use the GDPR erasure endpoint (DELETE /api/v1/agents/{id}/memories) for intentional deletion.
Comparison table
| Feature | Self-hosted pgvector | Zep community (self-hosted) | Kronvex |
|---|---|---|---|
| EU data residency | ✓ Yes (your infra) | ✓ Yes (your infra) | ✓ Yes (Supabase Frankfurt) |
| Ops overhead | High | High | None |
| Setup time | 2–4 weeks | 1–2 weeks | < 1 hour |
| GDPR erasure endpoint | DIY | DIY | ✓ Built-in |
| GDPR portability endpoint | DIY | DIY | ✓ Built-in |
| Memory approach | Vector similarity | Knowledge graph + vector | Vector similarity |
| Deterministic | ✓ Yes | ✗ No (LLM extraction) | ✓ Yes |
| LangChain integration | Custom | Official | Custom (documented) |
| Confidence scoring | DIY | Not exposed | ✓ Built-in |
| Pricing | Infrastructure cost | Infrastructure cost | €19–€599/mo flat |
| Multi-user isolation | DIY | Careful implementation needed | ✓ Default (user_id param) |
When to choose each:
- Self-hosted pgvector: Maximum control, existing PostgreSQL infra, large scale where API costs matter, strong data sovereignty requirements, engineering capacity to build the memory layer.
- Zep self-hosted: Need entity-relationship graph memory, comfortable with operational complexity, LLM-based extraction noise is acceptable.
- Kronvex: No infrastructure overhead, need GDPR endpoints out of the box, want deterministic auditable memory, predictable pricing.
Get started with Kronvex
pip install kronvex
# Get a demo API key
curl -X POST https://api.kronvex.io/auth/demo
Demo: 3 agents, 500 memories, no credit card.
- Full documentation: kronvex.io/docs
- LangChain integration guide: kronvex.io/docs/integrations/langchain
- Pricing: kronvex.io/#pricing
Give your LangChain agents persistent memory
EU-hosted, GDPR-native, flat pricing. Demo key in 30 seconds — no credit card required.
Start free →