August 19, 2025

LangGraph and FastAPI: How to build a chatbot as a microservice

In recent years, the ecosystem of tools for developing AI applications has been evolving rapidly. One of the most prominent examples is LangChain – a framework that has made working with large language models (LLMs) more structured and accessible. However, as is often the case with popular technologies, its growth has revealed certain limitations. A new approach has emerged in the form of LangGraph, offering a fresh perspective on building complex interactive systems.

In this post, we will examine the differences between LangChain and LangGraph, why the latter is increasingly chosen for chatbot development, and how to design a reliable microservice using them together with FastAPI.

What is LangChain

LangChain is a framework for working with LLMs that enables you to connect a model with external data sources, tools, and logic in the form of chains. Each chain is a sequence of steps through which a user request passes – from preprocessing data to generating a response.

Key capabilities of LangChain include:

Connecting to model APIs and vector databases.
Organizing processing pipelines.
Memory management to retain dialogue context.
Support for retrieval and search tools.

A significant advantage of LangChain is its ability to integrate with external sources: models can pull information from vector databases like FAISS or Milvus, allowing relevant semantic search and enriching responses with up-to-date context. The framework also flexibly supports memory management, enabling the retention of conversation context – a critical capability for chat systems that need to maintain continuity across multiple exchanges.

Chains make it possible to build linear interaction scenarios with the model. However, this very linearity is also their main drawback.

Challenges of LangChain in chatbots

When developing a chatbot with LangChain, the linear structure of chains often fails to reflect complex dialogue branching. If a scenario requires dynamic transitions, returning to previous steps, or concurrent processes, chains become cumbersome and difficult to maintain.

The key issues are:

Limited flexibility – changing step order or adding branches requires significant rework.
Difficulty managing state – especially in long or interrupted conversations.
Weak support for asynchronous interactions – pausing and resuming scenarios later is inconvenient.

LangGraph

LangGraph is a library that expands on the idea of LangChain but shifts it to a graph paradigm. Instead of linear chains, interactions are described as a state graph, where each node is a step or action, and edges represent possible transitions between them.

Advantages of LangGraph:

Flexible scenarios – easily model complex branches, loops, and conditional transitions.
State management – built-in mechanisms for saving progress and resuming execution.
Human-In-The-Loop – the ability to insert steps where a human makes the decision.
Waiting and asynchrony – execution can pause and resume later.

The platform supports not only a graph-based but also a functional development paradigm: you can leverage state management, human-in-the-loop, time travel, response streaming, and durable execution even without explicitly defining a graph – all through the lightweight, flexible Functional API that integrates well into various applications.

LangGraph provides a foundation for building sophisticated multi-agent systems where each agent can work autonomously, complete its tasks, and then interact with other agents based on shared state. This enables hierarchical or networked patterns of agent organization, with control, processing, and coordination handled manually or automatically, offering a high degree of flexibility.

Human-In-The-Loop

One of LangGraph’s key features is its support for Human-In-The-Loop (HITL). In a chatbot context, this means that at a specific step, the bot can hand control over to a human, wait for a response or confirmation, and then continue executing the graph.

Use cases:

Waiting for user input in iterative, alternating interactions with the system.
Verifying critical decisions (e.g., legal advice, medical recommendations).
Confirming complex actions (e.g., placing an order, charging a payment).

Technically, this is implemented through a “pause” in the graph: the node switches to a waiting state and resumes only after receiving data from an external system or operator. This architecture is not time-bound – if an operation requires external confirmation or additional data, execution can be postponed indefinitely. Every step is physically saved while waiting, and once confirmation is received, the system resumes execution through the Command mechanism, continuing the graph where it left off.

States and sessions in LangGraph

Unlike LangChain, where context is usually stored in a chain’s memory, LangGraph operates with states and sessions.

State describes the current node in the graph and what data has already been gathered
Session is a unique context for a specific interaction (e.g., a user dialogue) that can be saved, exported, or restored

This allows you to:

Pause graph execution at any moment.
Resume work after a service restart.
Scale the application by transferring sessions between microservice instances.

Microservice architecture for a chatbot

A client (web or mobile app, or any backend) sends a regular POST request to the FastAPI gateway with the user message and session/user identifiers. At the boundary, we validate the input and immediately prepare a response model: either returning the completed text if the branch is short, or instantly issuing an acknowledgment with ‘message_id’ so the client can poll (‘GET /messages/{id}’) for the final result later.

The service’s core is the compiled LangGraph. Each call retrieves the saved state snapshot from persistent storage by session key, executes several graph steps, and checkpoints again. If the scenario is short and fits within the request timeout, we generate the response immediately. If there are many steps or long-running tools involved, FastAPI validates and records the task, returns immediately, and lets the rest run as a background job. FastAPI’s built-in background task mechanism works well here: the operation continues after the HTTP response, preventing UI blocking and easing timeout constraints. For simple services, this avoids the need for queues and workers; at scale, you can swap the background task runner without changing API contracts.

State is the key to predictability and resilience. LangGraph saves a snapshot after each significant step, and with durable execution, it can resume exactly where it was interrupted – whether due to a container restart, external service failure, or intentional pause for HITL. This removes the need for custom state machines and makes long dialogues reproducible. A single persistent checkpoint store (SQL/NoSQL) is sufficient, with the “save → resume” principle built into the library.

When handling execution internally, it’s important to balance synchronous and asynchronous operations. FastAPI supports both, but blocking operations (external HTTP calls, I/O) should either be moved to background tasks immediately after responding to the client, or be run non-blocking to avoid tying up workers. This keeps the service stable – one client shouldn’t have to wait because the previous request is processing a slow prompt chain.

The result is a stable pipeline: the client calls a regular HTTPS endpoint, the service validates and either returns the final answer or issues an acknowledgment while continuing graph execution in the background; each significant step is checkpointed in persistent storage so that any failure or pause is handled gracefully; human involvement is triggered over HTTP without maintaining constant connections. This design avoids unusual infrastructure patterns and is easily adapted to most projects and use cases.

Conclusion

LangGraph combined with FastAPI provides a reliable and flexible foundation for building production-ready chatbots. LangGraph takes conversation logic to the next level, allowing dialogues to be modeled as graphs rather than linear chains, with built-in state control and the ability to resume execution via durable execution. This enables the creation of dialogue systems with branching, loops, Human-In-The-Loop, and resilience to failures, unlike LangChain, where such scenarios are harder to maintain.

FastAPI serves as a robust, straightforward, and scalable API interface, handling client requests through familiar HTTP endpoints, offering a hybrid model of immediate synchronous responses for simple branches and background processing for more complex scenarios. This approach keeps the user experience smooth and safe from timeouts or lost state.

Ultimately, this combination of technologies allows the creation of chatbot microservices that scale easily, are maintainable, and can evolve from simple use cases to complex scenarios involving human oversight, social agents, branching, and persistent context.

Nikolai Andriushchenko, ML engineer

BLOG