Python Apache-2.0 memorystateful-agentsorchestration

Letta

Memory-first agent framework for stateful, persistent AI agents

Letta (formerly MemGPT) is an open-source agent framework built around one idea: memory should be a first-class primitive, not an afterthought bolted on with a vector database. Agents get persistent IDs, hierarchical memory tiers, and the ability to manage their own memory through tool calls. For use cases that require an agent to remember what happened weeks or months ago, it's the most architecturally honest option in the space.

Most agent frameworks treat memory the same way most web apps treated authentication ten years ago: something you bolt on after the core thing is built, using whatever library is most convenient. Drop in a vector database. Add a retrieval step before the prompt. Call it done. The problem is that approach produces agents that feel amnesiac. Ask them something you discussed last week and they give you a blank look. They're not actually remembering anything; they're retrieving fragments, which is a different thing entirely.

Letta was built by people who thought this was the wrong approach from the beginning. The framework grew out of MemGPT, a research project from UC Berkeley's Sky Computing Lab, where Charles Packer and his collaborators noticed that the real bottleneck in making agents useful over long time horizons wasn't model capability. It was memory architecture. The paper they published in 2023 proposed treating LLM context management the way operating systems treat virtual memory: as a hierarchical system with fast and slow tiers, where the system itself decides what to keep active and what to page out. Letta is what that research looks like when you turn it into a production framework.

Quick verdict

Letta is the right choice when memory isn't a feature you're adding to your agent but the reason you're building an agent in the first place. Stateful customer companions, research assistants that accumulate domain knowledge over months, or any system where the agent's value compounds with each interaction: these are the use cases Letta was designed for. If you're building a stateless task-runner that processes documents in a pipeline and doesn't need to remember anything, use something simpler. But if your agent's memory is its product, Letta gives you the most honest architecture for that problem.

What Letta actually is

Letta is an open-source Python framework for building agents that persist across sessions. It's not a wrapper around a vector database. It's not a memory plugin you add to an existing chain. It's a full agent framework where persistence is the foundational design constraint, and everything else: tool use, model routing, orchestration, flows out from that.

Every agent in Letta gets an ID. That ID is stable. The agent with that ID has a memory state that accumulates over time. You can create an agent today, run a hundred conversations with it over three months, and when you come back to it, it remembers those conversations in a way that's structurally different from "we retrieved some relevant chunks from a database." The memory is curated, tiered, and actively maintained by the agent itself.

The rebranding from MemGPT to Letta in late 2024 marked the point where the team shifted from academic prototype to production software. The GitHub repository has over 22,000 stars as of May 2026, and the framework has grown to include a full API, multiple SDKs, a desktop UI with memory visualization, and Letta Cloud as a hosted offering. The research heritage is still there in the architecture, but it's now wrapped in the kind of tooling you'd actually use in production.

The memory architecture in detail

Hierarchical memory with main and archival tiers

The core of Letta's design is the two-tier memory hierarchy, which maps directly to the original MemGPT paper. The main context is the agent's active working memory: what's currently in the model's context window, immediately available for reasoning. This includes recent conversation history, the agent's current understanding of who you are, active task state, and anything else the agent has decided is immediately relevant.

Archival storage is everything else. It's a persistent database of information the agent has accumulated but doesn't need in its active window right now. Older conversation summaries, facts about you or your domain that were important in a previous session, notes the agent took on a task it completed last month. The archival store can be large, much larger than any context window, because it's not all loaded at once.

The agent moves things between these tiers through tool calls. When the agent decides that something in the current conversation is worth remembering long-term, it calls a memory write function. When it needs to recall something from archival storage, it calls a memory search function. These are not invisible infrastructure operations happening behind the scenes. They're actions the agent takes, which means you can see them, audit them, and understand why the agent made the choices it did.

This is what makes Letta's memory architecture different in kind, not just in degree, from a RAG pipeline. A RAG pipeline retrieves based on cosine similarity to the current query, which means it surfaces what's textually similar to what you asked, not necessarily what's most relevant to the agent's current situation. Letta's agent decides what to store and what to retrieve based on its own reasoning about what matters. That's a fundamentally more powerful approach for long-horizon tasks.

Stateful agents with persistent IDs

Each Letta agent has a persistent identifier. This sounds like a small thing, but it's actually the structural commitment that makes everything else possible. In a framework without persistent agent IDs, every session is essentially a fresh start: you pass in whatever context you want and hope it's enough. In Letta, the agent's history is the agent. The ID is the thread connecting every conversation, every memory operation, every piece of learned context.

The practical implications are significant. You can pause a long-running task, come back to it two weeks later, and the agent picks up with full awareness of where it left off. You can build a customer service agent that genuinely knows its regular customers over months of interaction, not because you've implemented a custom CRM integration, but because the agent's memory architecture handles that accumulation natively. You can run multiple agents with different IDs that have different knowledge and different histories, even if they're running the same base model.

Persistent IDs also make multi-agent systems cleaner. When agents need to hand off context to each other, they can share memory references rather than serializing and passing large context dumps. The plumbing for this is built into Letta's architecture in a way that most frameworks don't anticipate.

Memory tools the agent calls itself

This is the feature that most clearly distinguishes Letta from frameworks that have added memory as an afterthought. In Letta, the agent has access to memory management functions as tools it calls during its reasoning process. The standard toolkit includes functions like core_memory_append, core_memory_replace, archival_memory_insert, and archival_memory_search.

When the agent decides it wants to remember something, it calls the appropriate function. When it decides something in its main context is no longer relevant, it can update or replace that memory. When it needs to recall something from archival storage, it searches. These operations are part of the agent's action space, not separate infrastructure that fires automatically.

The result is an agent that actively manages its own knowledge. It's the difference between a person who takes notes and updates them as their understanding changes, versus a person who has a filing cabinet that gets new papers dropped in whenever certain keywords appear in conversation. The first approach produces a more coherent, more useful knowledge base over time.

Letta Cloud for hosted memory infra

Running Letta's memory infrastructure yourself is straightforward for development and small-scale use. For production systems with many agents running concurrently, or for teams that don't want to manage the backend themselves, Letta Cloud is the hosted option.

Letta Cloud handles the persistence layer, the memory database, agent state management, and the API endpoints. You interact with it through the same SDK you'd use locally, which means migrating from local development to cloud deployment doesn't require rewriting your agent logic. The team positions it as the option for developers who want to focus on building the agent rather than operating the memory infrastructure.

Pricing for Letta Cloud isn't published in a simple public tier structure as of May 2026. The team handles it based on usage and scale, which is typical for infrastructure products still finding their market shape. For most individual developers and small teams, the self-hosted open-source version is the right starting point.

MemGPT paper foundation

The theoretical underpinning matters here in a way it doesn't for most frameworks. The MemGPT paper (Packer et al., 2023) made a specific, testable claim: that you could extend the effective context of an LLM far beyond its native context window by treating context management as a virtual memory problem. The paper validated this on two concrete tasks: document analysis that required reasoning over corpora larger than the context window, and multi-session conversation agents that maintained coherent state across many interactions.

This means Letta's architecture has a published, peer-reviewed foundation. The design decisions aren't arbitrary; they flow from the theoretical model of what makes memory work in these systems. And the research team continues to publish: recent work from the Letta team includes "Sleep-time Compute" (inference during idle periods), "Context Constitution" (principled approaches to context management), and "Skill Learning" (dynamic capability acquisition). The framework evolves in tandem with the research, which is an unusual and useful property.

Pricing

The open-source framework is free. Apache 2.0 license means commercial use is fine. You pay your model provider directly for API usage, whether that's Anthropic for Claude, OpenAI for GPT, Google for Gemini, or whatever local model provider you're running. The SDK installs with pip (pip install letta-client) and the TypeScript version is available via npm (npm install @letta-ai/letta-client).

Letta Cloud costs money, but the team hasn't published a standard pricing page. Expect it to be usage-based with pricing that scales with the number of active agents and memory operations. For production systems where you're running dozens or hundreds of stateful agents, the infrastructure cost of hosting the memory backend yourself becomes real, and Letta Cloud becomes worth evaluating on its merits.

The Letta Code desktop app and CLI are also free. If you want to explore the framework's capabilities without writing any code first, installing Letta Code and running a local agent is a reasonable starting point.

Where Letta wins and where it doesn't

Letta is the right tool when memory is the problem you're trying to solve. Customer service agents that know their repeat customers. Research assistants that accumulate domain knowledge over a project's lifetime. Personal productivity agents that learn your working style, preferences, and ongoing commitments over months. Companion agents in consumer products where the value proposition is explicitly the relationship that builds over time. In all of these cases, Letta's architecture is doing real work that a stateless framework with a retrieval plugin isn't.

It's also worth mentioning multi-session enterprise workflows. If you're building an internal agent that employees interact with every day over years, and you want that agent to get genuinely better at serving each person over time rather than treating every conversation as the first, Letta's persistent ID model supports that. LangGraph can be made to do something similar with enough custom plumbing, but Letta ships it as the default behavior.

Where Letta doesn't win: stateless pipeline tasks. If you're building a document processing pipeline, a code review bot that runs on each pull request and doesn't need to remember previous reviews, or any task where each invocation is genuinely independent, the memory architecture adds complexity without adding value. For those cases, LangChain or a simpler framework is the more appropriate choice.

The community size is also something to weigh honestly. LangChain has a much larger ecosystem of tutorials, integrations, and Stack Overflow answers than Letta does. If you run into a specific integration problem at 11pm, you're more likely to find an existing solution for LangChain than for Letta. The Letta documentation is good, but the long tail of community knowledge is shorter.

Letta vs the alternatives

Letta vs LangChain

LangChain is the Swiss Army knife of LLM frameworks. It has connectors for almost every data source, integrations for almost every model, and a community that's been building on it since early 2023. If you're building something where the data connections matter more than the memory architecture, or where you need a broad ecosystem of ready-made integrations, LangChain is a reasonable choice.

The honest comparison is this: LangChain lets you add memory to an agent. Letta builds an agent around memory. Those are different design philosophies that produce different outcomes for different use cases. When I think about building an agent that a user will interact with every week for a year, I want the framework that treats memory as the primary concern.

Letta vs LangGraph

LangGraph is LangChain's graph-based orchestration layer for building complex agent workflows with explicit state management. It's excellent for multi-agent systems with intricate branching logic, human-in-the-loop approval steps, and workflows where you want to visualize the control flow as a graph.

LangGraph can checkpoint state between steps, which gives you some persistence. But it's persistence of task state, not the kind of long-horizon personal memory that Letta is designed for. A LangGraph agent remembers where it is in a workflow. A Letta agent remembers who you are and what you've talked about for the past six months. For complex orchestration with human review steps, LangGraph is superior. For agents where the relationship with the user is the product, Letta handles the harder problem.

Letta vs AutoGen

AutoGen (Microsoft) focuses on multi-agent conversation patterns where multiple models collaborate to solve tasks. It's good at setting up agent-to-agent communication, role assignment, and collaborative problem-solving within a session. It's less focused on cross-session persistence.

The comparison is most relevant for teams evaluating how to structure multi-agent systems. AutoGen handles the agent collaboration side. Letta handles the memory side. Some teams use both. The frameworks aren't competing for the same core use case in the way that LangChain and Letta are.

Getting started

Install the Python client:

pip install letta-client

For the TypeScript SDK:

npm install @letta-ai/letta-client

To run the full Letta server locally:

pip install letta
letta server

The Letta Code CLI, aimed at users who want to run a memory-first agent for local tasks, installs via Node:

npm install -g @letta-ai/letta-code

The framework's documentation walks you through creating your first persistent agent, which is the right starting point. Create an agent, give it an ID, have a few conversations with it, then look at its memory state. The memory palace visualization in the desktop UI is worth spending a few minutes with early on: it makes the two-tier architecture concrete in a way that reading the docs alone doesn't fully convey.

For teams building something in production, the decision point is whether to run the memory backend yourself or use Letta Cloud. For development, self-hosted is simpler. For production systems with reliability requirements and no desire to operate memory infrastructure, Letta Cloud is worth the conversation with the team.

The bottom line

Letta is doing something genuinely different from the other frameworks in this space. Most of them treat agents as task-runners and memory as a feature. Letta treats memory as the fundamental design constraint and builds the agent around it. For the growing category of applications where the agent's value comes from what it accumulates over time rather than what it can do in a single session, that architectural decision matters enormously.

The 22,000 GitHub stars and the continued research output from the team suggest this isn't a framework that's going to get abandoned. The rebranding from MemGPT to Letta was a deliberate signal that this is now a serious production project, not just a research demo. And the theoretical foundation in the MemGPT paper gives you confidence that the design decisions aren't arbitrary.

If you're building a Notion AI alternative that needs to learn a team's working style over months, or a research agent that accumulates domain knowledge progressively, Letta is where to start. If you're building a stateless pipeline, look elsewhere. The tool is honest about what it's for, and that clarity is one of its most useful features.

Key features

Hierarchical memory with main context and archival tiers
Stateful agents with persistent IDs that survive across sessions
Memory tools the agent calls itself (no external retrieval glue code required)
Letta Cloud for hosted memory infrastructure
Based on the MemGPT research paper from UC Berkeley
Model-agnostic with support for Claude, GPT, Gemini, and local models
Background dream agents that refine memory during idle time
Memory palace visualization in the desktop UI

Frequently Asked Questions

What is Letta?

Letta is an open-source Python framework for building stateful AI agents with persistent memory. It originated from the MemGPT research project at UC Berkeley and was spun out as a company by the same team. The core idea is that agents should have hierarchical memory tiers and the ability to manage their own memory through tool calls, so they can carry context across sessions that would otherwise exceed any model's context window.

What is the difference between Letta and MemGPT?

MemGPT was the original research project and open-source library that introduced the hierarchical memory concept. Letta is the rebranded, productized version of the same project. The GitHub repository moved from memgpt to letta-ai/letta, and the team launched Letta as a company offering both the open-source framework and Letta Cloud as a managed hosting option. The underlying memory architecture is the same; the name changed to reflect the shift from an academic prototype to a production framework.

How does Letta handle memory?

Letta uses a tiered memory system inspired by operating system virtual memory. The main context holds the most relevant recent information within the model's active window. Archival storage holds older or lower-priority information outside the context window. The agent itself decides what to move between tiers by calling memory tool functions, rather than relying on a separate retrieval pipeline you configure yourself. This means the agent actively curates its own knowledge rather than passively accepting whatever a retrieval system surfaces.

Is Letta free?

The open-source framework is free under the Apache 2.0 license. You pay for whatever model API you use on your own keys. Letta Cloud is a separate paid product for teams that want hosted memory infrastructure, managed agent state, and production-grade reliability without running their own backend. Pricing for Letta Cloud is not published on a flat public tier; contact the team for specifics.

How does Letta compare to LangChain?

LangChain is a general-purpose framework for building LLM applications and agent workflows. Memory in LangChain is one of many optional components you wire together. Letta is specifically built around memory as the central abstraction: the entire framework is designed for agents that need to remember things across long periods. If you're building a customer support agent that needs to recall conversations from six months ago, or a research assistant that accumulates knowledge over weeks of use, Letta's architecture fits that requirement better than LangChain's more modular approach. For short-lived orchestration tasks where persistence doesn't matter, LangChain or LangGraph is probably the simpler choice.