LIVE DEMO → Home Product
Features Use Cases Compare Enterprise
Docs
Documentation Quickstart MCP Server Integrations Benchmark
Pricing Blog DASHBOARD → LOG IN →
Tutorial Python LangChain May 4, 2026 · 12 min read

LangChain Persistent Memory:
Add long-term memory to your agent (2026)

Every time your LangChain agent restarts, it forgets everything. Users have to re-explain their preferences, repeat context, start from scratch. This tutorial shows you how to fix it in 4 steps — using Kronvex's memory API as a drop-in persistent layer.

The problem: agents that forget everything between sessions

You spend weeks building a LangChain agent. It handles multi-step reasoning, uses tools, streams responses beautifully. Then a user comes back the next day and asks "can you continue from where we left off?" — and the agent has no idea what they're talking about.

This isn't a LangChain bug. It's a fundamental architectural gap. ConversationBufferMemory, ConversationSummaryMemory, ConversationTokenBufferMemory — they all live in RAM. When the Python process exits, that state is gone. There is no built-in mechanism in LangChain to persist memory across process restarts, deployments, or even separate HTTP requests in a stateless API.

The gap becomes painful in real production scenarios:

  • A sales agent should remember that a lead is in the procurement phase and prefers technical detail — not re-ask qualifying questions every session
  • A support agent should recall that a user already reported this bug twice and is frustrated — not open yet another fresh ticket
  • A personal assistant should know your preferred writing style, project context, and past decisions — not make you re-specify them every conversation
  • A coding agent should remember your project's tech stack, naming conventions, and architectural decisions across refactoring sessions

The solution is a persistent memory store that lives outside the Python process and can be queried semantically — not just chronologically. That's exactly what Kronvex provides.

Why Python variables and naive Postgres don't cut it

Before reaching for a purpose-built memory API, most developers try two obvious shortcuts. Both fall short in production.

Approach A: Store conversation history in a Python variable

Works fine for a single session. Breaks immediately when: (1) the process restarts, (2) you deploy to a serverless environment where each invocation is stateless, (3) a user has more than ~20 exchanges and the context window overflows, (4) you have multiple users whose histories must be isolated. In short: it doesn't scale past a demo.

Approach B: Dump chat history rows into a PostgreSQL table

Better, but introduces its own problems. How do you decide which rows to load? If you load the 50 most recent messages, a user who mentioned a critical preference 500 messages ago will find the agent has forgotten it. If you load everything, you hit the context window limit fast. Keyword-based filtering is brittle — "Python" won't match "prefer scripting over compiled languages."

What you actually need is semantic retrieval: given the user's current query, find the 5 most relevant memories regardless of when they were stored. This requires vector embeddings, cosine similarity search, and a scoring model that balances recency with relevance. That's non-trivial to build, test, and maintain — which is why Kronvex exists.

Kronvex confidence scoring combines three signals: semantic similarity (cosine distance via pgvector), recency (sigmoid with 30-day inflection), and access frequency (log-scaled). Formula: confidence = similarity × 0.6 + recency × 0.2 + frequency × 0.2. Your agent gets the memories that matter, not just the newest ones.

Installation: pip install in 60 seconds

Terminal
pip install kronvex langchain langchain-openai openai requests

Get your free API key at kronvex.io/dashboard — no credit card required. The demo plan gives you 3 agents and 500 memories. Create an agent in the dashboard and note its ID (e.g. my-langchain-agent).

Verify your key works
import requests

KRONVEX_KEY = "kv-your-key"
AGENT_ID = "my-langchain-agent"
BASE = "https://api.kronvex.io/api/v1"

# Quick smoke test — store a memory
r = requests.post(
    f"{BASE}/agents/{AGENT_ID}/remember",
    headers={"X-API-Key": KRONVEX_KEY},
    json={"content": "Test memory: setup verified", "type": "episodic"}
)
print(r.status_code)  # 200 → you're good to go

Step 1: Build a basic LangChain agent

Start with a minimal but real LangChain agent using LCEL (LangChain Expression Language), the modern approach as of 2025–2026. We'll use a ChatPromptTemplate with a system message that can receive injected memory, and a simple RunnableWithMessageHistory pattern.

agent_base.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import HumanMessage, AIMessage
import os

# LLM — swap for any supported model
llm = ChatOpenAI(
    model="gpt-4o",
    api_key=os.environ["OPENAI_API_KEY"],
    temperature=0.3,
)

# System prompt with a {memory_context} placeholder for Kronvex injections
prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful assistant with persistent memory.

Relevant memories from past sessions:
{memory_context}

Use the above context naturally — don't mention it explicitly unless asked."""),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
])

# Basic chain (no memory yet — we'll add Kronvex in the next steps)
chain = prompt | llm | StrOutputParser()

# Test it without memory
response = chain.invoke({
    "memory_context": "(no memories yet)",
    "chat_history": [],
    "input": "Hi! I'm working on a FastAPI project.",
})
print(response)
# Works, but the next session the agent won't remember FastAPI at all

This is the starting point: a working LangChain agent that has no persistence. The system prompt already has a {memory_context} slot — we'll fill that with Kronvex in Steps 3 and 4.

Step 2: Store memories with remember() after each exchange

After every turn, we send the exchange to Kronvex. We store both the user's message and the agent's response as separate memories. The type field classifies what kind of memory it is — use "episodic" for conversation turns and "semantic" for extracted facts or preferences.

memory_store.py
import requests
import os
from typing import Optional

KRONVEX_KEY = os.environ["KRONVEX_API_KEY"]   # kv-...
AGENT_ID    = os.environ["KRONVEX_AGENT_ID"]    # e.g. "my-langchain-agent"
BASE        = "https://api.kronvex.io/api/v1"
HEADERS     = {"X-API-Key": KRONVEX_KEY}


def remember(
    content: str,
    memory_type: str = "episodic",
    session_id: Optional[str] = None,
) -> dict:
    """Store a memory in Kronvex. Returns the created memory object."""
    payload = {"content": content, "type": memory_type}
    if session_id:
        payload["session_id"] = session_id
    r = requests.post(
        f"{BASE}/agents/{AGENT_ID}/remember",
        headers=HEADERS,
        json=payload,
        timeout=5,
    )
    r.raise_for_status()
    return r.json()


def save_exchange(
    user_msg: str,
    ai_msg: str,
    session_id: Optional[str] = None,
) -> None:
    """Save a full turn (human + AI) as episodic memories."""
    if user_msg.strip():
        remember(
            content=f"User said: {user_msg}",
            memory_type="episodic",
            session_id=session_id,
        )
    if ai_msg.strip():
        remember(
            content=f"Assistant responded: {ai_msg}",
            memory_type="episodic",
            session_id=session_id,
        )


def save_preference(fact: str, session_id: Optional[str] = None) -> None:
    """Store an extracted fact or preference as a semantic memory."""
    remember(content=fact, memory_type="semantic", session_id=session_id)
Memory types matter. Use episodic for raw conversation turns — they're weighted by recency and decay naturally. Use semantic for extracted facts ("user prefers async Python", "project uses PostgreSQL 16") — they don't decay and rank highly on relevance. Use procedural for workflows and instructions the agent should follow long-term.

Now wire this into your agent's invocation loop. Notice that save_exchange() is called after getting the response — fire-and-forget is fine here since storage is async and doesn't block the user:

agent_with_memory.py — turn loop
from agent_base import chain
from memory_store import save_exchange
import threading

chat_history = []
USER_ID = "user-42"  # scope memories per user

def chat(user_input: str) -> str:
    # For now: empty memory context (Step 3 will fix this)
    response = chain.invoke({
        "memory_context": "(loading memories...)",
        "chat_history": chat_history,
        "input": user_input,
    })

    # Persist the exchange in the background (non-blocking)
    t = threading.Thread(
        target=save_exchange,
        args=(user_input, response, USER_ID),
        daemon=True,
    )
    t.start()

    # Update in-session history (for current context window)
    from langchain_core.messages import HumanMessage, AIMessage
    chat_history.append(HumanMessage(content=user_input))
    chat_history.append(AIMessage(content=response))

    return response

Step 3: Load context with recall() at session start

At the start of a new session, call recall() with a query that captures the user's current intent. Kronvex returns the top-K most relevant memories sorted by confidence score. You then format them and inject into the system prompt.

memory_store.py — add recall()
def recall(
    query: str,
    top_k: int = 5,
    session_id: Optional[str] = None,
) -> list[dict]:
    """Retrieve the most relevant memories for a given query."""
    payload = {"query": query, "top_k": top_k}
    if session_id:
        payload["session_id"] = session_id
    r = requests.post(
        f"{BASE}/agents/{AGENT_ID}/recall",
        headers=HEADERS,
        json=payload,
        timeout=5,
    )
    r.raise_for_status()
    return r.json().get("memories", [])


def format_memories(memories: list[dict]) -> str:
    """Format memory objects into a readable string for the system prompt."""
    if not memories:
        return "No relevant memories found."
    lines = []
    for m in memories:
        confidence = m.get("confidence", 0)
        mtype = m.get("type", "memory")
        content = m.get("content", "")
        lines.append(f"[{mtype}, confidence={confidence:.2f}] {content}")
    return "\n".join(lines)

Then update the agent to load memories before the first turn of each session. The key insight is that recall() takes a query — so you can pass the user's opening message to retrieve the most contextually relevant memories, not just the most recent ones:

agent_with_memory.py — session init with recall()
from memory_store import recall, format_memories, save_exchange

class MemoryAgent:
    def __init__(self, user_id: str):
        self.user_id = user_id
        self.chat_history = []
        self.memory_context = "(no memories yet)"

    def load_memories(self, opening_query: str = "general context") -> None:
        """Call once at session start to load relevant past memories."""
        memories = recall(
            query=opening_query,
            top_k=6,
            session_id=self.user_id,
        )
        self.memory_context = format_memories(memories)
        print(f"[Memory] Loaded {len(memories)} memories for user {self.user_id}")

    def chat(self, user_input: str) -> str:
        from agent_base import chain
        from langchain_core.messages import HumanMessage, AIMessage

        response = chain.invoke({
            "memory_context": self.memory_context,
            "chat_history": self.chat_history,
            "input": user_input,
        })

        # Store in background
        import threading
        t = threading.Thread(
            target=save_exchange,
            args=(user_input, response, self.user_id),
            daemon=True
        )
        t.start()

        self.chat_history.extend([
            HumanMessage(content=user_input),
            AIMessage(content=response),
        ])
        return response


# Usage — Session 1
agent = MemoryAgent(user_id="user-42")
agent.load_memories("FastAPI project setup")
print(agent.chat("I'm using Python 3.12 and PostgreSQL 16 on this project."))

# ... process restarts ... Session 2
agent2 = MemoryAgent(user_id="user-42")
agent2.load_memories("database migrations")
# memory_context now contains: "User said: I'm using Python 3.12 and PostgreSQL 16..."
print(agent2.chat("How should I handle Alembic migrations?"))
# Agent knows it's Python 3.12 + Postgres 16 without being told again ✓

Step 4: Use inject-context to pass memory directly to the LLM

inject-context is the third endpoint and the most convenient. Instead of getting raw memory objects (recall) and formatting them yourself, inject-context returns a single pre-formatted string that is already prompt-ready. It semantically searches, ranks, deduplicates, and formats in one call.

memory_store.py — add inject_context()
def inject_context(
    query: str,
    top_k: int = 5,
    session_id: Optional[str] = None,
) -> str:
    """Return a pre-formatted context string ready for injection into a system prompt."""
    payload = {"query": query}
    if top_k:
        payload["top_k"] = top_k
    if session_id:
        payload["session_id"] = session_id
    r = requests.post(
        f"{BASE}/agents/{AGENT_ID}/inject-context",
        headers=HEADERS,
        json=payload,
        timeout=5,
    )
    r.raise_for_status()
    return r.json().get("context", "")

The response's context field is a plain string you can drop directly into your system prompt. It looks like this:

Example inject-context response
"""
[semantic, 2026-04-28] User prefers concise answers without bullet points.
[semantic, 2026-04-15] Project uses Python 3.12 and PostgreSQL 16 with asyncpg.
[episodic, 2026-05-01] User asked about Alembic migrations for a FastAPI app.
[episodic, 2026-05-01] Assistant explained the upgrade/downgrade migration pattern.
[semantic, 2026-03-10] User works in a solo dev context, no team code review.
"""

Use inject-context in the load_memories() method to replace the manual recall + format pattern from Step 3. This is the production-ready approach:

MemoryAgent — final load_memories() with inject-context
from memory_store import inject_context, save_exchange

class MemoryAgent:
    def __init__(self, user_id: str):
        self.user_id = user_id
        self.chat_history = []
        self.memory_context = "(no memories yet)"

    def load_memories(self, opening_query: str = "general context") -> None:
        """Inject pre-formatted memory context in one API call."""
        self.memory_context = inject_context(
            query=opening_query,
            top_k=6,
            session_id=self.user_id,
        )

    def chat(self, user_input: str) -> str:
        from agent_base import chain, llm, prompt
        from langchain_core.messages import HumanMessage, AIMessage
        import threading

        response = chain.invoke({
            "memory_context": self.memory_context,
            "chat_history": self.chat_history,
            "input": user_input,
        })

        # Also update memory context mid-session for long conversations
        def _async_store():
            save_exchange(user_input, response, self.user_id)
            # Refresh context after storing (optional — good for long sessions)
            self.memory_context = inject_context(
                query=user_input,
                top_k=6,
                session_id=self.user_id,
            )

        threading.Thread(target=_async_store, daemon=True).start()

        self.chat_history.extend([
            HumanMessage(content=user_input),
            AIMessage(content=response),
        ])
        return response
Token budget note: With top_k=6, the injected context is typically 200–400 tokens. Keep top_k between 4 and 8 to stay within context window budgets. For GPT-4o's 128k context, this is not a concern — but for smaller models or cost-sensitive deployments, keep it lean.

Complete agent — copy-paste ready

Here is the complete, production-ready implementation in a single file. Set your environment variables, run it, and your LangChain agent will remember across sessions from the first turn.

kronvex_agent.py — complete implementation
"""
LangChain agent with persistent memory via Kronvex.
Requires: pip install kronvex langchain langchain-openai openai requests

Env vars:
  OPENAI_API_KEY     — your OpenAI key
  KRONVEX_API_KEY    — your Kronvex key (kv-...)
  KRONVEX_AGENT_ID   — your agent ID in Kronvex dashboard
"""
import os
import threading
import requests
from typing import Optional

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import HumanMessage, AIMessage

# ── Config ────────────────────────────────────────────────────────
KRONVEX_KEY = os.environ["KRONVEX_API_KEY"]
AGENT_ID    = os.environ["KRONVEX_AGENT_ID"]
BASE        = "https://api.kronvex.io/api/v1"
HEADERS     = {"X-API-Key": KRONVEX_KEY}

# ── Kronvex helpers ───────────────────────────────────────────────
def remember(content: str, memory_type: str = "episodic", session_id: Optional[str] = None):
    payload = {"content": content, "type": memory_type}
    if session_id:
        payload["session_id"] = session_id
    requests.post(f"{BASE}/agents/{AGENT_ID}/remember", headers=HEADERS, json=payload, timeout=5)

def recall(query: str, top_k: int = 5, session_id: Optional[str] = None) -> list:
    payload = {"query": query, "top_k": top_k}
    if session_id:
        payload["session_id"] = session_id
    r = requests.post(f"{BASE}/agents/{AGENT_ID}/recall", headers=HEADERS, json=payload, timeout=5)
    return r.json().get("memories", []) if r.ok else []

def inject_context(query: str, top_k: int = 5, session_id: Optional[str] = None) -> str:
    payload = {"query": query, "top_k": top_k}
    if session_id:
        payload["session_id"] = session_id
    r = requests.post(f"{BASE}/agents/{AGENT_ID}/inject-context", headers=HEADERS, json=payload, timeout=5)
    return r.json().get("context", "") if r.ok else ""

# ── LangChain setup ───────────────────────────────────────────────
llm = ChatOpenAI(model="gpt-4o", temperature=0.3)

prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful assistant with persistent memory across sessions.

Relevant memories from past interactions:
{memory_context}

Use this context naturally. Do not reference it explicitly unless the user asks."""),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
])

chain = prompt | llm | StrOutputParser()

# ── Agent class ───────────────────────────────────────────────────
class PersistentAgent:
    """LangChain agent with Kronvex-backed persistent memory."""

    def __init__(self, user_id: str):
        self.user_id = user_id
        self.chat_history = []
        self.memory_context = "(no memories yet)"

    def start_session(self, topic: str = "general") -> None:
        """Call at the beginning of each new session to load relevant memories."""
        self.memory_context = inject_context(
            query=topic, top_k=6, session_id=self.user_id
        )

    def store_preference(self, fact: str) -> None:
        """Explicitly store a semantic memory (preference, fact, context)."""
        remember(fact, memory_type="semantic", session_id=self.user_id)

    def chat(self, user_input: str) -> str:
        response = chain.invoke({
            "memory_context": self.memory_context,
            "chat_history": self.chat_history,
            "input": user_input,
        })

        def _persist():
            # Store the exchange
            remember(f"User: {user_input}", session_id=self.user_id)
            remember(f"Assistant: {response}", session_id=self.user_id)
            # Refresh context for next turn
            self.memory_context = inject_context(
                query=user_input, top_k=6, session_id=self.user_id
            )

        threading.Thread(target=_persist, daemon=True).start()

        self.chat_history.extend([
            HumanMessage(content=user_input),
            AIMessage(content=response),
        ])
        return response


# ── Demo ──────────────────────────────────────────────────────────
if __name__ == "__main__":
    # Session 1: user shares context
    agent = PersistentAgent(user_id="alice")
    agent.start_session("Python development")

    agent.store_preference("User prefers concise answers without lengthy preambles.")
    agent.store_preference("User works on a FastAPI project with PostgreSQL 16 and asyncpg.")

    r1 = agent.chat("How do I add an index to my users table in Alembic?")
    print("Session 1:", r1)

    # Simulate process restart — Session 2
    agent2 = PersistentAgent(user_id="alice")  # same user_id → same memories
    agent2.start_session("database performance")

    r2 = agent2.chat("What's the best way to profile slow queries?")
    print("Session 2:", r2)
    # Agent already knows: FastAPI, PostgreSQL 16, concise answers preferred
    # No re-explanation needed ✓

Try it now — free, no credit card

Demo plan: 3 agents, 500 memories. Your API key is ready in 30 seconds.

Get my free API key →

FAQ

Does LangChain support persistent memory between sessions natively?

No. LangChain's built-in memory classes (ConversationBufferMemory, ConversationSummaryMemory, etc.) are all in-memory and session-scoped — they reset when the Python process exits. The framework deliberately leaves persistence to external layers. Kronvex fills exactly this gap: it's a purpose-built memory store designed for this use case, with semantic search so your agent retrieves relevant memories rather than just recent ones.

What's the difference between Kronvex and storing history in PostgreSQL?

A naive Postgres table stores raw rows and requires you to decide what to load at query time. If you load the last N rows, the agent forgets anything older. If you load everything, you'll blow the context window. Kronvex stores vector embeddings of each memory and at query time retrieves the semantically closest matches — regardless of when they were stored. It also applies a confidence score (similarity × recency × frequency) so you always get the most contextually appropriate memories, not just the newest.

Can I scope memories per user in a multi-tenant LangChain app?

Yes — pass session_id=user_id in every remember, recall, and inject-context call. Kronvex uses this to partition memories so that User A's context never leaks into User B's responses. In a FastAPI or Django app, derive the session_id from your authenticated user's ID. The agent ID stays the same across all users — it identifies the agent, not the user.

Is Kronvex GDPR compliant? Where is data stored?

All data is stored in the EU (Frankfurt, AWS eu-central-1) via a Supabase PostgreSQL instance. No data is transferred outside Europe. Kronvex is GDPR-native by design — there's no configuration required to be compliant. You can delete individual memories or entire agents via the API or dashboard at any time. For enterprise customers, data processing agreements (DPAs) are available on request.

Your agent has a memory problem. Fix it in 5 minutes.

Free plan: 3 agents, 500 memories. No credit card. EU-hosted. GDPR-native. Your key is ready in 30 seconds.

Get my free API key →

Or read the full integration guide →
Integration guide
LangChain Persistent Memory — Setup Guide
Step-by-step setup · code snippets · 2 min
Read the guide →
Related articles