Skip to content

Proxy Server

The LM-Deluge proxy server is a FastAPI-based reverse proxy that exposes OpenAI-compatible and Anthropic-compatible API endpoints. It allows you to route requests through lm-deluge’s multi-provider support, apply model policies, and use a unified API regardless of which provider you’re targeting.

Install with the server extras:

Terminal window
pip install lm-deluge[server]

Start the server:

Terminal window
python -m lm_deluge.server

The server starts on http://0.0.0.0:8000 by default.

EndpointMethodDescription
/healthGETHealth check
/v1/modelsGETList available models (OpenAI-compatible)
/v1/chat/completionsPOSTOpenAI-compatible chat completions
/v1/messagesPOSTAnthropic-compatible messages
/messagesPOSTAlternative Anthropic endpoint (SDK compatibility)
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="your-proxy-key", # Only if DELUGE_PROXY_API_KEY is set
)
response = client.chat.completions.create(
model="claude-3.5-sonnet", # Any model in the registry
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
from anthropic import Anthropic
client = Anthropic(
base_url="http://localhost:8000",
api_key="your-proxy-key", # Only if DELUGE_PROXY_API_KEY is set
)
response = client.messages.create(
model="gpt-4o", # Can use any model, even OpenAI models via Anthropic SDK
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.content[0].text)
Terminal window
# OpenAI format
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-key-here>" \
-d '{
"model": "claude-3.5-sonnet",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Anthropic format
curl http://localhost:8000/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: <your-key-here>" \
-d '{
"model": "gpt-4o",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}]
}'
Terminal window
python -m lm_deluge.server [OPTIONS]
OptionDescription
--host HOSTHost to bind (default: 0.0.0.0)
--port PORTPort to run on (default: 8000)
--reloadEnable auto-reload for development
--config PATHPath to YAML config file
--mode MODEModel policy mode: allow_user_pick, force_default, alias_only
--allow-model MODELAllow a specific model (can be repeated)
--default-model MODELDefault model for force_default mode
--routes JSON5JSON5 string defining route aliases
--expose-aliasesShow route aliases in /v1/models
--hide-aliasesHide route aliases from /v1/models
VariableDescription
DELUGE_PROXY_HOSTHost to bind (default: 0.0.0.0)
DELUGE_PROXY_PORTPort to run on (default: 8000)
DELUGE_PROXY_API_KEYAPI key clients must provide (optional)
DELUGE_PROXY_TIMEOUTRequest timeout in seconds (default: 120)
DELUGE_PROXY_LOG_REQUESTSLog incoming proxy requests
DELUGE_PROXY_LOG_PROVIDER_REQUESTSLog outbound provider requests
DELUGE_CACHE_PATTERNCache pattern for Anthropic models

The DELUGE_CACHE_PATTERN environment variable controls prompt caching for Anthropic models:

  • tools_only - Cache tools definition
  • system_and_tools - Cache system prompt and tools
  • last_user_message - Cache last user message
  • last_2_user_messages - Cache last 2 user messages
  • last_3_user_messages - Cache last 3 user messages

Model policies control which models are exposed and how requests are routed.

allow_user_pick (default): Clients can request any allowed model.

Terminal window
python -m lm_deluge.server --mode allow_user_pick

force_default: All requests are routed to a default model regardless of what clients request.

Terminal window
python -m lm_deluge.server --mode force_default --default-model claude-3.5-sonnet

alias_only: Only exposes configured route aliases, hiding actual model names.

Terminal window
python -m lm_deluge.server --mode alias_only --routes '{"smart": {"models": ["claude-3.5-sonnet"]}}'

Limit which models can be used:

Terminal window
python -m lm_deluge.server \
--allow-model claude-3.5-sonnet \
--allow-model gpt-4o \
--allow-model gpt-4o-mini

Route aliases let you expose friendly names that map to one or more backend models:

Terminal window
python -m lm_deluge.server --routes '{
"fast": {"models": ["gpt-4o-mini", "claude-3.5-haiku"], "strategy": "round_robin"},
"smart": {"models": ["claude-3.5-sonnet", "gpt-4o"], "strategy": "random"},
"best": {"models": ["claude-3.5-sonnet"], "strategy": "round_robin"}
}'

Clients can then request model: "fast" and the proxy will route to one of the configured models.

  • round_robin: Rotate through models in order
  • random: Pick a random model each request
  • weighted: Pick models based on weights
Terminal window
# Weighted routing: 70% to sonnet, 30% to gpt-4o
python -m lm_deluge.server --routes '{
"smart": {
"models": ["claude-3.5-sonnet", "gpt-4o"],
"strategy": "weighted",
"weights": [0.7, 0.3]
}
}'

For complex setups, use a YAML config file:

proxy-config.yaml
model_policy:
mode: allow_user_pick
allowed_models:
- claude-3.5-sonnet
- claude-3.5-haiku
- gpt-4o
- gpt-4o-mini
expose_aliases: true
routes:
fast:
models:
- gpt-4o-mini
- claude-3.5-haiku
strategy: round_robin
smart:
models:
- claude-3.5-sonnet
- gpt-4o
strategy: weighted
weights:
- 0.6
- 0.4

Start with the config:

Terminal window
python -m lm_deluge.server --config proxy-config.yaml

By default, the proxy doesn’t require authentication. To enable it, set the DELUGE_PROXY_API_KEY environment variable:

Terminal window
export DELUGE_PROXY_API_KEY="your-secret-key"
python -m lm_deluge.server

Clients must then provide the key:

  • OpenAI format: Authorization: Bearer your-secret-key
  • Anthropic format: x-api-key: your-secret-key

The proxy automatically converts tools between OpenAI and Anthropic formats. You can send OpenAI-style tool definitions to the Anthropic endpoint or vice versa, and the proxy handles the conversion.

# OpenAI SDK calling an Anthropic model with tools
response = client.chat.completions.create(
model="claude-3.5-sonnet",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
}]
)
  • No streaming: The proxy currently does not support streaming responses. Set stream=false in your requests.
  • No embeddings: Only chat/message completions are supported.

A typical development setup with logging and auto-reload:

Terminal window
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export DELUGE_PROXY_LOG_REQUESTS=1
python -m lm_deluge.server --reload --port 8080

A production setup with authentication and restricted models:

Terminal window
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export DELUGE_PROXY_API_KEY="my-proxy-secret"
export DELUGE_PROXY_TIMEOUT=300
export DELUGE_CACHE_PATTERN=system_and_tools
python -m lm_deluge.server \
--host 0.0.0.0 \
--port 8000 \
--config /etc/deluge/proxy.yaml