Quickstart
Get Headroom running in 5 minutes. Install, compress, and send to your LLM with fewer tokens.
This guide gets you from zero to compressed LLM calls in under 5 minutes.
1. Install
npm install headroom-aipip install "headroom-ai[all]"TypeScript SDK requires the proxy
The TypeScript SDK sends messages to a local Headroom proxy for compression. Start the proxy before using the TS SDK:
pip install "headroom-ai[proxy]"
headroom proxy --port 8787The proxy runs the compression pipeline (Python) and exposes an HTTP API that the TS SDK calls.
2. Compress messages
import { } from 'headroom-ai';
const = [
{ : 'system' as , : 'You analyze search results.' },
{ : 'user' as , : 'Search for Python tutorials.' },
{
: 'assistant' as ,
: null,
: [{
: 'call_1',
: 'function' as ,
: { : 'search', : '{"q": "python"}' },
}],
},
{
: 'tool' as ,
: 'call_1',
: .({
: .({ : 500 }, (, ) => ({
: `Result ${}`,
: `Description ${}`,
: 100 - ,
})),
}),
},
{ : 'user' as , : 'What are the top 3 results?' },
];
const = await (, {
: 'gpt-4o',
: 'http://localhost:8787',
});from headroom import compress
import json
messages = [
{"role": "system", "content": "You analyze search results."},
{"role": "user", "content": "Search for Python tutorials."},
{
"role": "assistant",
"content": None,
"tool_calls": [{
"id": "call_1",
"type": "function",
"function": {"name": "search", "arguments": '{"q": "python"}'},
}],
},
{
"role": "tool",
"tool_call_id": "call_1",
"content": json.dumps({
"results": [
{"title": f"Result {i}", "snippet": f"Description {i}", "score": 100 - i}
for i in range(500)
]
}),
},
{"role": "user", "content": "What are the top 3 results?"},
]
result = compress(messages, model="gpt-4o")3. Send to your LLM
Use the compressed messages exactly like the originals:
import from 'openai';
const = new ();
// result.messages from the previous step
const : any[] = [];
const = await ...({
: 'gpt-4o',
,
});
.(.[0]..);from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=result.messages,
)
print(response.choices[0].message.content)4. Check your savings
.(`Tokens before: ${.}`);
.(`Tokens after: ${.}`);
.(`Tokens saved: ${.}`);
.(`Compression: ${(. * 100).(0)}%`);
.(`Transforms: ${..(', ')}`);Example output:
Tokens before: 45000
Tokens after: 4500
Tokens saved: 40500
Compression: 90%
Transforms: smart_crusher, cache_alignerprint(f"Tokens before: {result.tokens_before}")
print(f"Tokens after: {result.tokens_after}")
print(f"Tokens saved: {result.tokens_saved}")
print(f"Compression: {result.compression_ratio:.0%}")
print(f"Transforms: {result.transforms_applied}")Example output:
Tokens before: 45000
Tokens after: 4500
Tokens saved: 40500
Compression: 90%
Transforms: ['smart_crusher', 'cache_aligner']Alternative: proxy mode (zero code changes)
If you do not want to change any code, run Headroom as a proxy and point your existing client at it:
# Start the proxy
headroom proxy --port 8787
# Point Claude Code at it
ANTHROPIC_BASE_URL=http://localhost:8787 claude
# Or any OpenAI-compatible client
OPENAI_BASE_URL=http://localhost:8787/v1 your-appAll requests flow through Headroom automatically. Check savings at any time:
curl http://localhost:8787/stats
# {"requests_total": 42, "tokens_saved_total": 125000, ...}What gets compressed
The biggest savings come from tool outputs -- search results, database rows, log files, API responses. Headroom auto-detects the content type and routes it to the best compressor. No configuration needed.
| Content type | Compressor | Typical savings |
|---|---|---|
| JSON arrays | SmartCrusher | 70--90% |
| Source code | CodeCompressor | 40--70% |
| Build/test logs | LogCompressor | 80--95% |
| Search results | SearchCompressor | 60--80% |
| Plain text | Kompress | 30--50% |
Next steps
Introduction
Headroom is the context optimization layer for LLM applications. Compress tool outputs, DB results, file reads, and RAG results before they reach the model. Same answers, fraction of the tokens.
Installation
Install Headroom via pip, npm, or Docker. Includes all Python extras, TypeScript setup, Docker image tags, and environment variables.