LangServe is an open-source library developed to dramatically simplify the deployment and production rollout of applications built with the LangChain framework. Its primary goal is to enable developers to quickly and easily expose their LangChain agents, chains, and models as REST APIs, making their use and integration into other systems much more straightforward.
Built on the FastAPI Python web framework, known for its performance and ease of use, LangServe offers numerous key features to accelerate production deployment. It automatically generates comprehensive OpenAPI documentation and input/output schemas for each endpoint, enabling developers and clients to quickly understand and use the API.
In addition to API generation, LangServe also includes a highly practical interactive playground for testing and debugging the various endpoints and chains, as well as configuration sharing features via unique links. The library also provides a Python client for easily calling deployed APIs from other applications.
Whether you're a data scientist looking to make your models accessible or a developer wanting to integrate natural language processing capabilities into your applications, LangServe will save you valuable time by allowing you to go from experimenting with LangChain to deploying in production on the cloud (AWS, Azure, GCP, etc.) in the blink of an eye.
In this article, we'll see how to get started with LangServe to deploy your first endpoints, how to use the client and playground, and most importantly, how to put your LangChain applications into production in just a few lines of code on the major cloud providers.
Before diving into LangServe, make sure you have a working Python development environment on your machine. LangServe requires Python 3.7 or higher. You can check your Python version by running the command python --version in your terminal.
The pip package manager is also required to easily install LangServe and its dependencies. Pip is generally installed by default with recent versions of Python. To make sure, run pip --version. If pip isn't present, refer to the official Python documentation for installation instructions specific to your operating system.
A basic knowledge of the LangChain framework is strongly recommended to get the most out of LangServe. You should be comfortable with key concepts such as agents, chains, language models, and prompts. If you're new to LangChain, it's advisable to familiarize yourself with the official documentation and go through some examples before moving on to LangServe.
Similarly, an understanding of the basics of FastAPI, the web framework that LangServe is built on, will be useful for customizing and debugging your applications if needed. Since FastAPI is a very popular and well-documented framework, you'll easily find plenty of online resources to help you learn its core concepts, such as routing, dependencies, and request/response handling.
Finally, while not strictly necessary to get started, having a grasp of REST APIs, JSON schemas, and OpenAPI documentation will allow you to understand LangServe's internal workings more deeply and make the most of its automatic generation capabilities.
In summary, make sure you have:
pip installedLangChain (agents, chains, models)FastAPI (routing, dependencies, requests/responses)Once these prerequisites are met, you'll be ready to install LangServe and create your first application!
Installing LangServe is very straightforward using the pip package manager. Before proceeding with the installation, it's recommended to create and activate a new Python virtual environment to avoid any conflicts with system packages or other projects. You can use venv, conda, or any other virtual environment management tool of your choice.
Once your virtual environment is activated, run the following command in your terminal to install LangServe with all its dependencies:
pip install "langserve[all]"
This command will install both the server and client components of LangServe, along with necessary dependencies such as FastAPI, Uvicorn (a fast ASGI server), and Pydantic (for data validation).
Pip will download and install all required packages into your virtual environment. Once the installation is complete, you'll be able to import and use LangServe in your Python code.
It's important to note that LangServe also requires a recent version of the LangChain framework to function properly. If you haven't yet installed LangChain, pip will handle it automatically during the LangServe installation. However, if you already have LangChain installed, make sure it's up to date to ensure optimal compatibility with LangServe.
Now that LangServe is installed in your Python environment, you're ready to create your first application! In the next section, we'll see how to set up a FastAPI server with LangServe to expose language models as APIs.
To explore how LangServe works at a basic level, we'll create a simple application that exposes two chat models, one using the OpenAI API and the other using Anthropic's. This application will also allow us to invoke a chain that combines a prompt template with the Anthropic model to generate jokes on a given topic.
Here's the complete code for our application:
from fastapi import FastAPI
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatAnthropic, ChatOpenAI
from langserve import add_routes
app = FastAPI(
title="LangChain Server",
version="1.0",
description="A simple API server using LangChain's Runnable interfaces",
)
# Expose the ChatOpenAI model under the /openai path
add_routes(
app,
ChatOpenAI(),
path="/openai",
)
# Expose the ChatAnthropic model under the /anthropic path
add_routes(
app,
ChatAnthropic(),
path="/anthropic",
)
model = ChatAnthropic()
prompt = ChatPromptTemplate.from_template("Tell me a joke about {topic}")
# Expose the prompt | model chain under the /joke path
add_routes(
app,
prompt | model,
path="/joke",
)
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="localhost", port=8000)
Let's break down this code step by step:
We start by importing the necessary dependencies: FastAPI to create our web application, ChatPromptTemplate to define our prompt template, ChatAnthropic and ChatOpenAI for the chat models, and finally add_routes from LangServe to expose our components as APIs.
We create a FastAPI application instance, specifying a title, version, and description. This information will be used to generate the OpenAPI documentation for our API.
We use LangServe's add_routes function to expose our first model, a ChatOpenAI instance, under the /openai path. This will automatically generate all the necessary endpoints to interact with the model (invoke, astream, batch, etc.).
Similarly, we expose a ChatAnthropic instance under the /anthropic path.
We then create a chain combining a prompt template (requesting a joke on a given topic) and the Anthropic model, using LangChain's | operator. This chain is exposed under the /joke path via add_routes.
Finally, we add an if __name__ == "__main__" block to launch our server with Uvicorn when the script is executed directly. The server will listen on localhost at port 8000.
To launch the application, simply run the Python script. Uvicorn will start the server, and you'll be able to access your API at http://localhost:8000.
Thanks to the magic of LangServe and FastAPI, comprehensive OpenAPI documentation is automatically generated for your API. You can access it by navigating to http://localhost:8000/docs in your web browser. This interactive documentation lets you view all available endpoints, their input and output schemas, and even test them directly from the web interface.
You can also access LangServe's playground for each component by adding /playground to its path (for example, http://localhost:8000/openai/playground). The playground provides a user-friendly interface for testing the various features of your API.
And there you have it! With just a few lines of code, you've created a complete API capable of running chat models and custom chains. LangServe handles all the technical details, letting you focus on your application's business logic.
In the next section, we'll see how to use the Python client provided by LangServe to interact with our new API.
Now that our API is up and running, let's see how to interact with it using the Python SDK client provided by LangServe. This SDK makes it easy to call the various API endpoints as if the components were running locally, while transparently handling network communications and data serialization.
Here's an example of client code using the LangServe SDK to call our API:
from langchain.schema import SystemMessage, HumanMessage
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnableMap
from langserve import RemoteRunnable
# Create RemoteRunnable instances for each model
openai = RemoteRunnable("http://localhost:8000/openai/")
anthropic = RemoteRunnable("http://localhost:8000/anthropic/")
joke_chain = RemoteRunnable("http://localhost:8000/joke/")
# Call the joke chain synchronously
response = joke_chain.invoke({"topic": "cats"})
print(response)
# Define a prompt as messages
prompt = [
SystemMessage(content="Act as an assistant named Claude."),
HumanMessage(content="Hello Claude! How are you?"),
]
# Stream the Anthropic model's response asynchronously
async for token in anthropic.astream(prompt):
print(token, end="", flush=True)
# Define a new prompt with a template
prompt = ChatPromptTemplate.from_messages([
("system", "You are an assistant that writes long stories about a given topic."),
])
# Build a custom chain combining both models
chain = prompt | RunnableMap({
"openai": openai,
"anthropic": anthropic,
})
# Call the custom chain in batch mode
responses = chain.batch([{"topic": "parrots"}, {"topic": "cats"}])
print(responses)
The SDK exposes a RemoteRunnable class that encapsulates the logic for calling the various API endpoints. For each component we want to use, we create a RemoteRunnable instance by specifying its URL.
We can then call the invoke and astream methods on these instances to execute components synchronously or asynchronously, as if they were simple Python objects.
The invoke method sends a single request to the API's /invoke endpoint and returns the complete response. This is the simplest calling mode, ideal for tasks that don't require streaming.
The astream method, on the other hand, sends a request to the /astream endpoint and returns an asynchronous generator that yields response tokens as they're generated by the server. This mode is perfect for getting real-time results and displaying a progressive response to the user.
The SDK also makes it easy to build custom chains by combining multiple remote components using the | operator, in the same way as with local components. This lets you leverage LangChain's flexibility to create complex processing pipelines that run transparently via API calls.
In summary, the LangServe SDK greatly simplifies using LangChain APIs from Python applications. Thanks to an intuitive interface modeled after local component interfaces, integrating remote models and chains into your code becomes effortless. The SDK handles all the low-level aspects like JSON serialization and HTTP requests, letting you focus on your application logic.
In the next section, we'll explore other ways to call our LangServe API, using various languages and tools such as TypeScript, Python with requests, and curl on the command line.
In addition to the official Python SDK, LangServe allows interaction with deployed APIs from a wide range of other languages and tools. Whether you prefer working with TypeScript, Python without additional dependencies, or even directly on the command line with curl, LangServe offers great flexibility to adapt to your development environment.
For developers working in the JavaScript/TypeScript ecosystem, LangChain.js offers a LangServe-compatible client starting from version 0.0.166. This client lets you call LangServe APIs in an intuitive and strongly typed manner.
Here's an example of using the TypeScript client to invoke the joke chain we deployed earlier:
import { RemoteRunnable } from "@langchain/core/runnables/remote";
const chain = new RemoteRunnable({
url: "http://localhost:8000/joke/", // LangServe API URL
});
const result = await chain.invoke({ topic: "cats" });
console.log(result); // Displays the API result
The client exposes a RemoteRunnable class similar to the one in the Python SDK, making the use of LangServe consistent and familiar regardless of the platform.
If you want to interact with a LangServe API from a Python script without installing additional dependencies, you can use the standard requests library directly. This approach is a bit more verbose than using the official SDK, but offers great flexibility and can be useful in certain scenarios.
Here's how to call the /joke/invoke endpoint using requests:
import requests
response = requests.post(
"http://localhost:8000/joke/invoke", # Invocation endpoint URL
json={"input": {"topic": "cats"}} # Input parameters in JSON format
)
result = response.json() # Decode the JSON response
print(result) # Display the result
Simply specify the full endpoint URL, provide the input parameters in JSON format in the request body, and retrieve the result by decoding the JSON response.
Finally, for command-line enthusiasts or situations where installing an SDK isn't practical, LangServe can be used directly with the curl tool. This approach is ideal for quickly testing an API or integrating it into shell scripts.
Here's an example of calling the /joke/invoke endpoint with curl:
curl --location --request POST 'http://localhost:8000/joke/invoke' \
--header 'Content-Type: application/json' \
--data-raw '{
"input": {
"topic": "cats"
}
}'
We specify the HTTP method (POST), the endpoint URL, the content type (application/json), and the input parameters in JSON format. curl handles sending the request and displays the response directly in the terminal.
In summary, LangServe stands out for its excellent interoperability and ease of integration. Thanks to a standard, well-documented REST API, it's possible to interact with LangServe endpoints from virtually any language or tool capable of sending HTTP requests. Whether you're a fan of TypeScript, Python, or even curl, LangServe adapts to your environment and preferences, letting you take advantage of its advanced features with ease.
The playground is a very handy feature offered by LangServe for easily testing and debugging the various deployed endpoints and chains. It's an interactive web interface, accessible directly from the API, that lets you execute requests against components without having to write any code.
To access a component's playground, simply add /playground to the end of its base URL. For example, if we've deployed a chain under the /my_chain path, its playground will be accessible at http://my-server.com/my_chain/playground.
The playground offers an intuitive user interface, divided into several sections:
The playground also supports testing streaming endpoints (like /astream). In this case, the response is displayed in real time, as data is received from the server.
One of the main advantages of the playground is that it's entirely self-documenting. Each endpoint is described in detail, with explanations of its behavior, input parameters, and output format. The JSON schemas for input and output types are also displayed, serving as a reference for building valid requests.
Furthermore, the playground is directly linked to the API's OpenAPI documentation. It's possible to navigate seamlessly between the two interfaces to consult the detailed specifications of each endpoint while testing them interactively.
LangServe's playground also offers advanced features for certain types of components. For example, for conversation (chat) chains, it provides a specialized interface for viewing the exchange history and entering new messages naturally.
In summary, the playground is an essential tool offered by LangServe to facilitate testing and debugging of deployed APIs. Thanks to its intuitive, self-documenting web interface linked to the OpenAPI specification, it lets developers explore and validate component behavior interactively, without having to write code. Support for streaming requests and specialized interfaces for certain component types make it a real asset for the productivity and quality of APIs created with LangServe.
Once your LangServe API is deployed and tested, you'll probably want to share it with other developers or integrate it into other applications. LangServe offers advanced sharing and link generation features to facilitate these tasks.
Each component deployed via LangServe is associated with a unique link, which can be shared to allow others to directly access its playground. This link takes the following form: https://your-api.com/path/to/component/playground.
Simply copy and paste this link and send it to the relevant developers. By opening it in their browser, they'll access an interactive interface that lets them test the component with different inputs, view the results, and consult the associated documentation. It's an excellent way to showcase your API's features and let others try it out without having to set up a development environment.
Additionally, for parameterizable components (configurable runnables), LangServe's playground offers the ability to generate links specific to a given configuration. Imagine, for example, that you've deployed a retrieval chain where the user can select the knowledge base to query via a parameter called knowledge_base.
From the playground, you can choose a specific knowledge base, say knowledge_base_42, optionally fill in other parameters, then click the "Copy link" button that appears. You'll then get a link in the form https://your-api.com/path/to/component/playground?params=... that encodes the chosen configuration.
By sharing this link, you allow anyone to test your component with the pre-filled configuration, without having to manually select the knowledge base or adjust other parameters. This is particularly useful for presenting a specific use case, sharing an interesting result, or facilitating debugging of a particular configuration.
It's important to note that these links provide read-only access to your API. People who use them can execute the component with the provided parameters, but cannot modify the underlying code or configuration. The generated playground links are therefore a safe and convenient way to share and promote your work.
In summary, the sharing and link generation features offered by LangServe are a valuable asset for facilitating collaboration and demonstration around your APIs. In just a few clicks, you can allow other developers to test your components interactively, with a custom configuration, while preserving the security of your deployment. It's an excellent way to showcase your achievements, gather feedback, and generate interest in your LangChain-based APIs.
Production deployment is a crucial step in an application's lifecycle, and APIs developed with LangServe are no exception. It involves making your application accessible to end users in a reliable, performant, and secure manner.
LangServe was designed to facilitate deployment on major cloud providers, leveraging proven technologies like Docker containers. Here's an overview of the recommended deployment options for different platforms.
Amazon Web Services (AWS) is one of the leaders in cloud computing, offering a wide range of services for hosting and orchestrating applications. To deploy a LangServe API on AWS, we recommend using AWS Copilot, a command-line tool that simplifies the management of containerized applications.
Here are the general steps for deploying with AWS Copilot:
copilot init --app my-app --name my-service --type 'Load Balanced Web Service' --dockerfile './Dockerfile' --deploy
This command creates the necessary configuration files, builds the Docker image for your application, and deploys it to an ECS (Elastic Container Service) cluster.
copilot svc show --name my-service --url
AWS Copilot offers many other features, such as canary deployments, environment variable management, auto-scaling, and more. Refer to the official documentation to learn more.
Microsoft Azure is another popular cloud platform, offering compute, storage, and networking services. To deploy a LangServe API on Azure, we recommend using Azure Container Apps, a fully managed service for running containerized applications.
Here's how to deploy with Azure Container Apps:
az login
az group create --name my-resource-group --location eastus
az containerapp env create --name my-environment --resource-group my-resource-group --location eastus
az containerapp create --name my-app --resource-group my-resource-group --environment my-environment --image my-registry.azurecr.io/my-image:tag --target-port 8000 --ingress 'external'
This command builds the Docker image for your application, pushes it to a container registry (Azure Container Registry), and deploys it on Azure Container Apps. Your API is then accessible via a public URL.
To learn more about Azure Container Apps, refer to the official documentation.
Google Cloud Platform (GCP) is Google's cloud platform, offering a comprehensive suite of services for deploying and managing applications. To deploy a LangServe API on GCP, we recommend using Cloud Run, a fully managed serverless compute service for containerized applications.
Here are the steps for deploying with Cloud Run:
gcloud init
gcloud builds submit --tag gcr.io/my-project/my-app
gcloud run deploy my-app --image gcr.io/my-project/my-app --platform managed --allow-unauthenticated --region us-central1 --port 8000
This command deploys your Docker image to a Cloud Run instance and makes your API accessible via a public URL.
Cloud Run offers many advanced features, such as automatic scaling, progressive deployments, secret management, and more. Refer to the official documentation to learn more.
At the end of this article, we've been able to see just how valuable LangServe is for industrializing and putting into production applications built with the LangChain framework. By dramatically simplifying the deployment of agents, chains, and models as REST APIs, LangServe lets developers focus on their application's business logic while benefiting from advanced capabilities in terms of documentation, testing, and sharing.
The automatic generation of comprehensive OpenAPI documentation and JSON schemas for each deployed endpoint is a major asset for promoting adoption and integration of APIs created with LangServe. Thanks to always up-to-date and easily accessible documentation, developers can quickly understand and use these APIs in their own projects.
The interactive playground offered by LangServe is another strong point of this library. By allowing you to test and debug the various components visually and interactively, directly from the API's web interface, it greatly facilitates getting started and experimentation. The configuration sharing features via unique links are also very appreciated for collaborating efficiently around APIs.
On the production deployment front, LangServe integrates perfectly with major cloud computing providers like Amazon Web Services, Microsoft Azure, and Google Cloud Platform. Deployment as Docker containers, orchestrated by tools like AWS Copilot, Azure Container Apps, or Google Cloud Run, brings a high level of flexibility and portability. APIs deployed this way benefit from all the power and reliability of cloud infrastructure, while minimizing system administration efforts.
In conclusion, LangServe stands out as the ideal solution for industrializing and leveraging LangChain-based applications. Thanks to its advanced API generation, documentation, testing, and deployment features, this library opens new perspectives for democratizing natural language processing technologies. Whether you're a data scientist, developer, or architect, LangServe will effectively support you in putting your models and conversational agents into production, allowing you to create value quickly and confidently.
To go further with LangServe, don't hesitate to check out the following resources:
There's no doubt that LangServe will quickly become an essential tool for anyone looking to build state-of-the-art language processing applications and put them into production with ease. Its seamless integration with LangChain and its deployment-oriented approach make it a strategic choice for large-scale projects.
Diplômé d'Epitech et membre actif de l'AI Squad, Tristan est un profil polyvalent qui avance sur tous les fronts : articles techniques (MCP d'Anthropic, ISO 42001), webinars, podcasts, co-construction de la scale-up LAMALO. Chez Reboot, il fait partie de ceux qui font bouger les lignes sur l'IA.
LinkedInGet our best articles every month.
Le premier produit propre de Reboot Conseil. Une solution innovante née de la collaboration.
ProjectCréer une plateforme IA accessible sur web et mobile. Un projet combinant orchestration IA et mobilité.
ProjectRéduire le délai de conception bijoutière de 8 jours à 20 minutes grâce à l'IA générative et la modélisation 3D.
TrainingMaîtrisez les APIs, intégrez l'IA dans vos applications. Embeddings, fine-tuning, function calling.
ServiceFormateurs opérationnels. IA, data science, développement web. Certifié Qualiopi.
ArticlePère Castor, raconte-moi N8N N8N (prononcez « n-huit-n » ou « nodemation » si vous voulez faire classe). C'est un outil qui permet de connecter vos...