LiteLLM
Add Headroom compression to LiteLLM with a single callback. Works with all 100+ supported providers.
Headroom integrates with LiteLLM as a callback that compresses messages before they reach any provider. One line to enable, works with all 100+ LiteLLM-supported providers.
Installation
pip install headroom-ai litellmQuick start
import litellm
from headroom.integrations.litellm_callback import HeadroomCallback
litellm.callbacks = [HeadroomCallback()]
# All calls now compressed automatically
response = litellm.completion(model="gpt-4o", messages=[...])
response = litellm.completion(model="bedrock/claude-sonnet", messages=[...])
response = litellm.completion(model="azure/gpt-4o", messages=[...])The callback compresses messages in LiteLLM's pre_call_hook before they reach the provider.
How it works
- You call
litellm.completion()with your messages HeadroomCallback.pre_call_hookcompresses the messages- LiteLLM sends the compressed messages to the provider
- The response comes back unchanged
This works with every provider LiteLLM supports: OpenAI, Anthropic, Bedrock, Azure, Vertex AI, Cohere, Groq, Mistral, Together, Ollama, and more.
With LiteLLM Proxy
If you run LiteLLM as a proxy server, use the ASGI middleware:
from litellm.proxy.proxy_server import app
from headroom.integrations.asgi import CompressionMiddleware
app.add_middleware(CompressionMiddleware)Or configure via YAML:
# litellm_config.yaml
litellm_settings:
callbacks: ["headroom.integrations.litellm_callback.HeadroomCallback"]Direct compress() with LiteLLM
You can also use compress() directly instead of the callback:
import litellm
from headroom import compress
messages = [{"role": "user", "content": large_content}]
compressed = compress(messages, model="bedrock/claude-sonnet")
response = litellm.completion(
model="bedrock/claude-sonnet",
messages=compressed.messages,
)
print(f"Saved {compressed.tokens_saved} tokens")ASGI middleware
Drop-in middleware for any ASGI application. Intercepts /v1/messages, /v1/chat/completions, /v1/responses, and /chat/completions:
from fastapi import FastAPI
from headroom.integrations.asgi import CompressionMiddleware
app = FastAPI()
app.add_middleware(CompressionMiddleware)Response headers include x-headroom-compressed: true and x-headroom-tokens-saved: 1234.