Headroom

LiteLLM

Add Headroom compression to LiteLLM with a single callback. Works with all 100+ supported providers.

Headroom integrates with LiteLLM as a callback that compresses messages before they reach any provider. One line to enable, works with all 100+ LiteLLM-supported providers.

Installation

pip install headroom-ai litellm

Quick start

import litellm
from headroom.integrations.litellm_callback import HeadroomCallback

litellm.callbacks = [HeadroomCallback()]

# All calls now compressed automatically
response = litellm.completion(model="gpt-4o", messages=[...])
response = litellm.completion(model="bedrock/claude-sonnet", messages=[...])
response = litellm.completion(model="azure/gpt-4o", messages=[...])

The callback compresses messages in LiteLLM's pre_call_hook before they reach the provider.

How it works

  1. You call litellm.completion() with your messages
  2. HeadroomCallback.pre_call_hook compresses the messages
  3. LiteLLM sends the compressed messages to the provider
  4. The response comes back unchanged

This works with every provider LiteLLM supports: OpenAI, Anthropic, Bedrock, Azure, Vertex AI, Cohere, Groq, Mistral, Together, Ollama, and more.

With LiteLLM Proxy

If you run LiteLLM as a proxy server, use the ASGI middleware:

from litellm.proxy.proxy_server import app
from headroom.integrations.asgi import CompressionMiddleware

app.add_middleware(CompressionMiddleware)

Or configure via YAML:

# litellm_config.yaml
litellm_settings:
  callbacks: ["headroom.integrations.litellm_callback.HeadroomCallback"]

Direct compress() with LiteLLM

You can also use compress() directly instead of the callback:

import litellm
from headroom import compress

messages = [{"role": "user", "content": large_content}]
compressed = compress(messages, model="bedrock/claude-sonnet")

response = litellm.completion(
    model="bedrock/claude-sonnet",
    messages=compressed.messages,
)

print(f"Saved {compressed.tokens_saved} tokens")

ASGI middleware

Drop-in middleware for any ASGI application. Intercepts /v1/messages, /v1/chat/completions, /v1/responses, and /chat/completions:

from fastapi import FastAPI
from headroom.integrations.asgi import CompressionMiddleware

app = FastAPI()
app.add_middleware(CompressionMiddleware)

Response headers include x-headroom-compressed: true and x-headroom-tokens-saved: 1234.

On this page