Deployment Guide
Your Parlant agent works great locally, but production deployment requires a few key changes. This guide walks you through setting up Parlant on Kubernetes with proper authentication, persistence, and scaling—whether you're deploying to AWS, Azure, or another cloud provider.
What This Guide Covers
This guide focuses on the infrastructure and deployment aspects unique to taking Parlant from local development to production. For topics like authentication policies, frontend integration, and agentic design principles, we'll reference the relevant documentation sections.
By the end of this guide, you'll have:
- A containerized Parlant application
- A production-ready Kubernetes deployment
- MongoDB persistence configured
- Load balancing and HTTPS termination
- A scalable, secure production environment
Architecture Overview
Here's what a typical Parlant production deployment looks like:
Key Components:
- Load Balancer: Handles SSL termination and routes traffic to Parlant pods
- Parlant Pods: Stateless application containers (horizontally scalable)
- MongoDB: Persistent storage for sessions and customer data
- LLM Provider: External API for NLP services (OpenAI, Anthropic, etc.)
Prerequisites
Before you begin, ensure you have:
Local Tools:
- Python 3.10 or higher
- Docker installed and running
- kubectl CLI tool
- Cloud provider CLI (AWS CLI or Azure CLI)
- A code editor
Cloud Resources:
- Access to AWS EKS or Azure AKS (or another Kubernetes provider)
- New to EKS? See the AWS EKS Getting Started Guide
- New to AKS? See the Azure AKS Quickstart
- A MongoDB instance (MongoDB Atlas recommended, or managed MongoDB from your cloud provider)
- (Optional) A domain name for your agent
- (Optional) SSL certificate (can use Let's Encrypt or cloud provider certificates)
Knowledge Prerequisites:
- Basic understanding of Kubernetes concepts (pods, services, deployments)
- Familiarity with environment variables and configuration management
- Basic Docker knowledge
This guide assumes you have a working Parlant agent running locally. If you haven't built your agent yet, start with the Installation guide.
Understanding Parlant's Production Requirements
Stateless Architecture
Parlant's server is designed to be stateless, which means:
- All session state is stored in MongoDB, not in memory
- Multiple Parlant pods can run simultaneously without coordination
- You can scale horizontally by adding more pods
- Pods can be restarted or replaced without losing data
This design makes Parlant naturally suited for cloud deployment and Kubernetes orchestration.
Persistence Layer
Parlant requires two MongoDB collections:
- Sessions: Stores conversation state, events, and history
- Customers: Stores customer profiles and associated data
Both collections must be accessible from all Parlant pods with consistent connection strings.
Port Configuration
By default, Parlant's FastAPI server listens on port 8800. In production:
- Your load balancer accepts HTTPS traffic on port
443 - The load balancer forwards to Parlant pods on port
8800 - Kubernetes services handle internal routing
Step 1: Prepare Your Production Application
Create a Production Configuration File
Create a production_config.py file to centralize your production settings:
# production_config.py
import os
import parlant.sdk as p
# MongoDB Configuration
MONGODB_SESSIONS_URI = os.environ["MONGODB_SESSIONS_URI"]
MONGODB_CUSTOMERS_URI = os.environ["MONGODB_CUSTOMERS_URI"]
# NLP Provider Configuration
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY")
# Server Configuration
SERVER_HOST = os.environ.get("SERVER_HOST", "0.0.0.0")
SERVER_PORT = int(os.environ.get("SERVER_PORT", "8800"))
# Choose your NLP service
NLP_SERVICE = p.NLPServices.openai # or p.NLPServices.anthropic
def get_mongodb_config():
"""Returns MongoDB configuration for Parlant."""
return {
"sessions_uri": MONGODB_SESSIONS_URI,
"customers_uri": MONGODB_CUSTOMERS_URI,
}
Update Your Main Application File
Modify your main application to use production configuration:
# main.py
import asyncio
import parlant.sdk as p
from production_config import (
get_mongodb_config,
NLP_SERVICE,
SERVER_HOST,
SERVER_PORT
)
from auth import ProductionAuthPolicy # We'll create this next
async def configure_container(container: p.Container) -> p.Container:
"""Configure production-specific dependencies."""
# Set up production authorization
container[p.AuthorizationPolicy] = ProductionAuthPolicy(
secret_key=os.environ["JWT_SECRET_KEY"],
)
return container
async def main():
"""Initialize and run the Parlant server."""
# MongoDB configuration
mongodb_config = get_mongodb_config()
async with p.Server(
host=SERVER_HOST,
port=SERVER_PORT,
nlp_service=NLP_SERVICE,
configure_container=configure_container,
**mongodb_config
) as server:
# Create or retrieve your agent
agents = await server.list_agents()
if not agents:
agent = await server.create_agent(
name="Production Agent",
description="Your agent description here"
)
# Set up your guidelines, journeys, etc.
await setup_agent_behavior(agent)
# Start serving requests
await server.serve()
async def setup_agent_behavior(agent: p.Agent):
"""Configure your agent's behavior."""
# Your guidelines, journeys, tools, etc.
pass
if __name__ == "__main__":
asyncio.run(main())
Set Up Production Authorization
Create an auth.py file with your production authorization policy:
# auth.py
import parlant.sdk as p
class ProductionAuthPolicy(p.ProductionAuthorizationPolicy):
"""Production authorization with your custom rules."""
def __init__(self, secret_key: str):
super().__init__()
self.secret_key = secret_key
# Add your custom authorization logic here
For comprehensive guidance on implementing JWT authentication, rate limiting, M2M tokens, and custom authorization policies, see the API Hardening guide.
Step 2: Containerize Your Application
Create an Optimized Dockerfile
Create a Dockerfile in your project root:
# Use Python 3.10 slim image
FROM python:3.10-slim
# Set working directory
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements first (for better caching)
COPY requirements.txt .
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose Parlant's default port
EXPOSE 8800
# Health check endpoint
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import requests; requests.get('http://localhost:8800/health')" || exit 1
# Run the application
CMD ["python", "main.py"]
Create Requirements File
Your requirements.txt should include:
parlant>=3.0.0
pyjwt>=2.8.0
python-limits>=3.0.0
pymongo>=4.0.0
redis>=5.0.0
Build and Test Locally
Build your Docker image:
docker build -t parlant-agent:latest .
Test it locally with environment variables:
docker run -p 8800:8800 \
-e MONGODB_SESSIONS_URI="mongodb://localhost:27017/parlant_sessions" \
-e MONGODB_CUSTOMERS_URI="mongodb://localhost:27017/parlant_customers" \
-e OPENAI_API_KEY="your-key-here" \
-e JWT_SECRET_KEY="your-secret-here" \
parlant-agent:latest
Visit http://localhost:8800 to verify it's working.
Optimize Image Size (Optional)
For production, consider a multi-stage build to reduce image size. For more on optimizing Docker builds, see Docker's multi-stage build documentation.
# Stage 1: Builder
FROM python:3.10-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt
# Stage 2: Runtime
FROM python:3.10-slim
WORKDIR /app
# Copy only the dependencies from builder
COPY --from=builder /root/.local /root/.local
COPY . .
# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH
EXPOSE 8800
CMD ["python", "main.py"]