LangChain Persistent Memory in Production: EU-Hosted Options Compared

TL;DR: LangChain's default memory classes (ConversationBufferMemory, VectorStoreRetrieverMemory with FAISS, etc.) are in-process data structures. They disappear on restart, break under multi-worker deployments, and have no user isolation. For EU production deployments, you need an external persistent backend. This guide compares three options: self-hosted pgvector, Zep community edition, and Kronvex.

LangChain memory options: a quick overview

LangChain ships several memory classes. Understanding what each one actually does — and where it breaks down — is the starting point for choosing a production solution.

ConversationBufferMemory

The simplest option: stores the full conversation history in-process as a list of messages. Works for demos and single-session scripts. Fails the moment:

Your process restarts (memory gone)
You scale to multiple workers (each has its own copy)
Conversations get long (entire history gets re-injected every turn, burning tokens)

Python

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
# This is a dict in RAM. Nothing is persisted anywhere.

ConversationSummaryMemory

Summarizes conversation history using an LLM to keep the context window manageable. Better for long conversations, but still in-process and still not persistent.

ConversationBufferWindowMemory

Keeps only the last N turns in memory. Reduces token usage, but old context is lost entirely — and it's still not persistent.

VectorStoreRetrieverMemory

Uses a vector store backend to retrieve semantically relevant past messages rather than the whole history. This is the right architecture for production, but the default implementations use in-memory or local-file vector stores that aren't suitable for multi-process deployments.

Python

from langchain.memory import VectorStoreRetrieverMemory
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

# FAISS is in-memory — loses data on restart
vectorstore = FAISS.from_texts([""], embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever(search_kwargs=dict(k=5))
memory = VectorStoreRetrieverMemory(retriever=retriever)
# Still in-process. Still not production-ready.

LangMem (LangChain's experimental memory layer)

LangMem is an experimental project from the LangChain team that provides more sophisticated memory management, including background memory consolidation. It's US-hosted as a managed service (LangSmith infrastructure). As of April 2026, it's in early access and not recommended for regulated production use. No announced EU hosting.

The problem with default LangChain memory in production

The core issue is simple: none of LangChain's default memory classes persist to a durable store by default. They are in-process data structures.

Production requirements that break default memory:

Multi-worker deployments. If you run two Gunicorn workers or two Railway replicas, each has a separate memory object. A user who hits worker A has different "memory" from a user who hits worker B. This is a correctness bug in production.

Process restarts. Every deploy, crash, or scale-down event wipes in-process memory. Users start every conversation from scratch.

Long-running agents. As conversation history grows, re-injecting the full buffer grows the context window linearly. A six-month customer support history is not injectable.

Multi-user applications. In-process memory has no natural user isolation. One memory object per process, not per user.

The solution in all cases is the same: externalize memory to a persistent store with an API. The question for EU teams is which persistent store is compliant, operationally feasible, and affordable.

EU-hosted persistent memory options for LangChain

Option 1: Self-hosted pgvector on EU infrastructure

pgvector is a PostgreSQL extension that adds vector similarity search. You can run it on a EU-hosted PostgreSQL instance and build your own memory layer on top.

Advantages: Full control, no external dependency, cheapest at scale if you're already running PostgreSQL.

Disadvantages: You build and maintain the memory layer yourself — embedding, similarity search, confidence scoring, user isolation, GDPR endpoints. Typically 2–4 weeks of engineering work to get right, plus ongoing maintenance.

Python

# DIY pgvector memory — simplified
import psycopg2
from openai import OpenAI

client = OpenAI()

def remember(user_id: str, content: str, conn):
    embedding = client.embeddings.create(
        input=content,
        model="text-embedding-3-small"
    ).data[0].embedding

    with conn.cursor() as cur:
        cur.execute(
            "INSERT INTO memories (user_id, content, embedding) VALUES (%s, %s, %s)",
            (user_id, content, embedding)
        )
    conn.commit()

def recall(user_id: str, query: str, conn, top_k: int = 5):
    query_embedding = client.embeddings.create(
        input=query,
        model="text-embedding-3-small"
    ).data[0].embedding

    with conn.cursor() as cur:
        cur.execute(
            """
            SELECT content, 1 - (embedding <=> %s::vector) AS similarity
            FROM memories
            WHERE user_id = %s
            ORDER BY similarity DESC
            LIMIT %s
            """,
            (query_embedding, user_id, top_k)
        )
        return cur.fetchall()

# You also need: user isolation, GDPR endpoints, confidence scoring, API layer...

Who this is right for: Teams with existing PostgreSQL infrastructure, strong data sovereignty requirements, and engineering capacity to build and maintain the memory layer.

Option 2: Zep community edition (self-hosted)

Zep is an open-source memory layer with a knowledge graph approach. It can be self-hosted on EU infrastructure.

Advantages: Richer than raw pgvector — entity extraction, relationship graphs, dialog classification. Official LangChain integration via ZepChatMessageHistory.

Disadvantages: Operationally complex to run (depends on Neo4j or similar), non-deterministic LLM-based extraction, requires infrastructure ownership, and the multi-user isolation model requires careful implementation.

Python

from langchain_community.memory import ZepChatMessageHistory
from langchain.memory import ConversationBufferMemory

# Requires your own Zep instance running on EU infra
history = ZepChatMessageHistory(
    session_id="user-123-session-456",
    url="https://your-zep.your-eu-domain.com",
    api_key="your-zep-key",
)

memory = ConversationBufferMemory(
    chat_memory=history,
    return_messages=True,
)

Who this is right for: Teams that need complex entity-relationship graph memory and have infrastructure capacity to run Zep.

Option 3: Kronvex (EU-managed API)

Kronvex is a managed EU-hosted memory API. No infrastructure to run — you call remember(), recall(), and inject-context(). Runs on Supabase Frankfurt.

Advantages: No ops overhead, EU-native, GDPR endpoints built in, deterministic and auditable, predictable flat pricing.

Disadvantages: Not self-hostable. Less powerful than Zep's knowledge graph for entity-relationship use cases. Requires explicit remember() calls (no auto-extraction).

Python

from kronvex import KronvexClient

kv = KronvexClient(api_key="kv-your-key", base_url="https://api.kronvex.io")

Full code example: LangChain + Kronvex in production

This is a production-ready pattern for a multi-user LangChain agent with persistent, EU-hosted memory.

Python

from langchain.memory import BaseChatMemory
from langchain.chains import ConversationChain
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from kronvex import KronvexClient
from typing import Any

kv = KronvexClient(api_key="kv-your-key", base_url="https://api.kronvex.io")
AGENT_ID = "production-agent"


class KronvexMemory(BaseChatMemory):
    """
    Production LangChain memory backed by Kronvex.
    - Persists across process restarts and deploys
    - Isolated per user_id
    - GDPR-compliant (EU-hosted, Art. 17/20 endpoints available)
    """

    kv_client: Any
    agent_id: str
    user_id: str
    top_k: int = 5
    memory_key: str = "history"
    min_confidence: float = 0.4  # filter out low-confidence recalls

    class Config:
        arbitrary_types_allowed = True

    @property
    def memory_variables(self) -> list[str]:
        return [self.memory_key]

    def load_memory_variables(self, inputs: dict) -> dict:
        query = inputs.get("input", "")
        if not query:
            return {self.memory_key: "No prior context."}

        memories = self.kv_client.recall(
            agent_id=self.agent_id,
            query=query,
            user_id=self.user_id,
            top_k=self.top_k,
        )

        # Filter by confidence threshold
        relevant = [m for m in memories if m.get("confidence", 0) >= self.min_confidence]

        if not relevant:
            return {self.memory_key: "No relevant prior context found."}

        lines = [f"- {m['content']} [confidence: {m['confidence']:.2f}]" for m in relevant]
        return {self.memory_key: "\n".join(lines)}

    def save_context(self, inputs: dict, outputs: dict) -> None:
        human_input = inputs.get("input", "")
        ai_output = outputs.get("response", "")

        # Store what matters — not raw turns, but meaningful statements
        if human_input and len(human_input) > 20:  # skip trivial inputs
            self.kv_client.remember(
                agent_id=self.agent_id,
                content=human_input,
                user_id=self.user_id,
            )

    def clear(self) -> None:
        pass  # Use the GDPR erasure endpoint for intentional deletion


def create_agent_for_user(user_id: str) -> ConversationChain:
    memory = KronvexMemory(
        kv_client=kv,
        agent_id=AGENT_ID,
        user_id=user_id,
    )

    prompt = PromptTemplate(
        input_variables=["history", "input"],
        template=(
            "You are a helpful assistant. "
            "Use the context from past sessions to give consistent, personalized responses.\n\n"
            "Past session context:\n{history}\n\n"
            "Human: {input}\n"
            "Assistant:"
        ),
    )

    return ConversationChain(
        llm=ChatOpenAI(model="gpt-4o-mini"),
        memory=memory,
        prompt=prompt,
        verbose=False,
    )


# Usage — each user gets their own isolated memory
agent_alice = create_agent_for_user("user-alice")
agent_bob = create_agent_for_user("user-bob")

# Session 1 — alice
agent_alice.predict(input="I'm building a fintech app in Python using FastAPI.")

# Later — different process, same user_id
agent_alice_new_session = create_agent_for_user("user-alice")
response = agent_alice_new_session.predict(
    input="What framework should I use for the API layer?"
)
# Agent recalls alice's FastAPI + fintech context
print(response)

Key design decisions in the example above: The min_confidence threshold (0.4) prevents low-signal noise from polluting the context. The save_context method skips inputs shorter than 20 characters to avoid storing trivial turns like "ok" or "yes". The clear() method is intentionally a no-op — use the GDPR erasure endpoint (DELETE /api/v1/agents/{id}/memories) for intentional deletion.

Comparison table

Feature	Self-hosted pgvector	Zep community (self-hosted)	Kronvex
EU data residency	✓ Yes (your infra)	✓ Yes (your infra)	✓ Yes (Supabase Frankfurt)
Ops overhead	High	High	None
Setup time	2–4 weeks	1–2 weeks	< 1 hour
GDPR erasure endpoint	DIY	DIY	✓ Built-in
GDPR portability endpoint	DIY	DIY	✓ Built-in
Memory approach	Vector similarity	Knowledge graph + vector	Vector similarity
Deterministic	✓ Yes	✗ No (LLM extraction)	✓ Yes
LangChain integration	Custom	Official	Custom (documented)
Confidence scoring	DIY	Not exposed	✓ Built-in
Pricing	Infrastructure cost	Infrastructure cost	€19–€599/mo flat
Multi-user isolation	DIY	Careful implementation needed	✓ Default (`user_id` param)

When to choose each:

Self-hosted pgvector: Maximum control, existing PostgreSQL infra, large scale where API costs matter, strong data sovereignty requirements, engineering capacity to build the memory layer.
Zep self-hosted: Need entity-relationship graph memory, comfortable with operational complexity, LLM-based extraction noise is acceptable.
Kronvex: No infrastructure overhead, need GDPR endpoints out of the box, want deterministic auditable memory, predictable pricing.

Get started with Kronvex

Bash

pip install kronvex

# Get a demo API key
curl -X POST https://api.kronvex.io/auth/demo

Demo: 3 agents, 500 memories, no credit card.

Full documentation: kronvex.io/docs
LangChain integration guide: kronvex.io/docs/integrations/langchain
Pricing: kronvex.io/#pricing

LangChain Memory GDPR EU pgvector Python Production

Give your LangChain agents persistent memory

EU-hosted, GDPR-native, flat pricing. Demo key in 30 seconds — no credit card required.

Start free →

LangChain persistent memory in production: EU-hosted options compared

Contents

LangChain memory options: a quick overview

ConversationBufferMemory

ConversationSummaryMemory

ConversationBufferWindowMemory

VectorStoreRetrieverMemory

LangMem (LangChain's experimental memory layer)

The problem with default LangChain memory in production

EU-hosted persistent memory options for LangChain

Option 1: Self-hosted pgvector on EU infrastructure

Option 2: Zep community edition (self-hosted)

Option 3: Kronvex (EU-managed API)

Full code example: LangChain + Kronvex in production

Comparison table

Get started with Kronvex

Give your LangChain agents persistent memory