Headroom

Image Compression

ML-powered image compression that reduces vision model token usage by 40-90% while maintaining answer accuracy.

Vision models charge by the token, and images are expensive. A single 1024x1024 image costs ~765 tokens on OpenAI. Headroom's image compression uses a trained ML router to analyze your query and automatically select the optimal compression technique, saving 40-90% of image tokens.

How It Works

User uploads image + asks question
           |
   [Query Analysis]
   TrainedRouter (MiniLM from HuggingFace)
   Classifies: "What animal is this?" -> full_low
           |
   [Image Analysis]
   SigLIP analyzes image properties
   (has text? complex? fine details?)
           |
   [Apply Compression]
   OpenAI: detail="low"
   Anthropic: Resize to 512px
   Google: Resize to 768px
           |
   Compressed request to LLM

The router is a fine-tuned MiniLM classifier (chopratejas/technique-router on HuggingFace) with 93.7% accuracy across 1,157 training examples.

Compression Techniques

TechniqueSavingsWhen UsedExample Query
full_low~87%General understanding"What is this?", "Describe the scene"
preserve0%Fine details needed"Count the whiskers", "Read the serial number"
crop50-90%Region-specific queries"What's in the corner?", "Focus on the background"
transcode~99%Text extraction"Read the sign", "Transcribe the document"

Quick Start

With Headroom Proxy (Zero Code Changes)

# Start the proxy
headroom proxy --port 8787

# Connect your client -- images are compressed automatically
ANTHROPIC_BASE_URL=http://localhost:8787 claude

With HeadroomClient

from headroom import HeadroomClient

client = HeadroomClient(provider="openai")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What animal is this?"},
            {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
        ]
    }]
)
# Image automatically compressed with detail="low" (87% savings)

Direct API

from headroom.image import ImageCompressor

compressor = ImageCompressor()

# Compress images in messages
compressed_messages = compressor.compress(messages, provider="openai")

# Check savings
print(f"Saved {compressor.last_savings:.0f}% tokens")
print(f"Technique: {compressor.last_result.technique.value}")

Provider Support

The compressor adapts its strategy per provider:

ProviderCompression MethodDetails
OpenAISets detail="low"Native detail parameter
AnthropicResizes to 512pxPIL-based resize
Google GeminiResizes to 768pxOptimized for Gemini's 768x768 tile system

Token Savings by Provider

OpenAI (1024x1024 image):

TechniqueBeforeAfterSavings
full_low765 tokens85 tokens89%
preserve765 tokens765 tokens0%

Anthropic (1024x1024 image):

BeforeAfterSavings
~1,398 tokens~349 tokens75%

Google Gemini (1536x1536 image):

BeforeAfterSavings
1,032 tokens (4 tiles)258 tokens (1 tile)75%

Configuration

from headroom.image import ImageCompressor

compressor = ImageCompressor(
    model_id="chopratejas/technique-router",  # HuggingFace model
    use_siglip=True,   # Enable image analysis
    device="cuda",     # Use GPU if available (auto, cuda, cpu, mps)
)

Proxy Configuration

# Enable image compression (default)
headroom proxy --image-optimize

# Disable image compression
headroom proxy --no-image-optimize

Performance

MetricValue
Router inference~10ms (CPU), ~2ms (GPU)
Image resize~5-20ms
First request+2-3s (model download, cached after)
Router accuracy93.7%
Model size~128MB
GPU memory (SigLIP)~400MB

Automatic with the proxy

When using the Headroom proxy, image compression happens automatically on every request that contains images. No code changes needed.

On this page