Quickstart

Get Headroom running in 5 minutes. Install, compress, and send to your LLM with fewer tokens.

This guide gets you from zero to compressed LLM calls in under 5 minutes.

1. Install

npm install headroom-ai

pip install "headroom-ai[all]"

TypeScript SDK requires the proxy

The TypeScript SDK sends messages to a local Headroom proxy for compression. Start the proxy before using the TS SDK:

pip install "headroom-ai[proxy]"
headroom proxy --port 8787

The proxy runs the compression pipeline (Python) and exposes an HTTP API that the TS SDK calls.

2. Compress messages

import {  } from 'headroom-ai';

const  = [
  { : 'system' as , : 'You analyze search results.' },
  { : 'user' as , : 'Search for Python tutorials.' },
  {
    : 'assistant' as ,
    : null,
    : [{
      : 'call_1',
      : 'function' as ,
      : { : 'search', : '{"q": "python"}' },
    }],
  },
  {
    : 'tool' as ,
    : 'call_1',
    : .({
      : .({ : 500 }, (, ) => ({
        : `Result ${}`,
        : `Description ${}`,
        : 100 - ,
      })),
    }),
  },
  { : 'user' as , : 'What are the top 3 results?' },
];

const  = await (, {
  : 'gpt-4o',
  : 'http://localhost:8787',
});

from headroom import compress
import json

messages = [
    {"role": "system", "content": "You analyze search results."},
    {"role": "user", "content": "Search for Python tutorials."},
    {
        "role": "assistant",
        "content": None,
        "tool_calls": [{
            "id": "call_1",
            "type": "function",
            "function": {"name": "search", "arguments": '{"q": "python"}'},
        }],
    },
    {
        "role": "tool",
        "tool_call_id": "call_1",
        "content": json.dumps({
            "results": [
                {"title": f"Result {i}", "snippet": f"Description {i}", "score": 100 - i}
                for i in range(500)
            ]
        }),
    },
    {"role": "user", "content": "What are the top 3 results?"},
]

result = compress(messages, model="gpt-4o")

3. Send to your LLM

Use the compressed messages exactly like the originals:

import  from 'openai';

const  = new ();

// result.messages from the previous step
const : any[] = [];

const  = await ...({
  : 'gpt-4o',
  ,
});

.(.[0]..);

from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=result.messages,
)

print(response.choices[0].message.content)

4. Check your savings

.(`Tokens before: ${.}`);
.(`Tokens after:  ${.}`);
.(`Tokens saved:  ${.}`);
.(`Compression:   ${(. * 100).(0)}%`);
.(`Transforms:    ${..(', ')}`);

Example output:

Tokens before: 45000
Tokens after:  4500
Tokens saved:  40500
Compression:   90%
Transforms:    smart_crusher, cache_aligner

print(f"Tokens before: {result.tokens_before}")
print(f"Tokens after:  {result.tokens_after}")
print(f"Tokens saved:  {result.tokens_saved}")
print(f"Compression:   {result.compression_ratio:.0%}")
print(f"Transforms:    {result.transforms_applied}")

Example output:

Tokens before: 45000
Tokens after:  4500
Tokens saved:  40500
Compression:   90%
Transforms:    ['smart_crusher', 'cache_aligner']

Alternative: proxy mode (zero code changes)

If you do not want to change any code, run Headroom as a proxy and point your existing client at it:

# Start the proxy
headroom proxy --port 8787

# Point Claude Code at it
ANTHROPIC_BASE_URL=http://localhost:8787 claude

# Or any OpenAI-compatible client
OPENAI_BASE_URL=http://localhost:8787/v1 your-app

All requests flow through Headroom automatically. Check savings at any time:

curl http://localhost:8787/stats
# {"requests_total": 42, "tokens_saved_total": 125000, ...}

The biggest savings come from tool outputs -- search results, database rows, log files, API responses. Headroom auto-detects the content type and routes it to the best compressor. No configuration needed.

Content type	Compressor	Typical savings
JSON arrays	SmartCrusher	70--90%
Source code	CodeCompressor	40--70%
Build/test logs	LogCompressor	80--95%
Search results	SearchCompressor	60--80%
Plain text	Kompress	30--50%

1. Install

2. Compress messages

3. Send to your LLM

4. Check your savings

Alternative: proxy mode (zero code changes)

What gets compressed

Next steps

Installation

Proxy Server

How Compression Works

Configuration

On this page