Headroom

How Compression Works

Understand Headroom's three-stage compression pipeline, automatic content routing, and how different content types are compressed.

Headroom automatically detects what kind of content you're sending and routes it to the right compressor. You don't need to configure anything -- just call compress() and the pipeline handles the rest.

The Three-Stage Pipeline

Every request flows through three stages:

┌──────────────┐     ┌────────────────┐     ┌─────────────────────┐
│ CacheAligner │────>│ ContentRouter  │────>│ IntelligentContext   │
│              │     │                │     │                     │
│ Stabilize    │     │ Detect type &  │     │ Score messages &    │
│ prefix for   │     │ route to best  │     │ fit within token    │
│ cache hits   │     │ compressor     │     │ budget              │
└──────────────┘     └────────────────┘     └─────────────────────┘
  1. CacheAligner extracts dynamic content (dates, user context) from your system prompt so the static prefix stays cacheable across requests.
  2. ContentRouter inspects each tool output and routes it to the optimal compressor -- SmartCrusher for JSON arrays, CodeAwareCompressor for source code, LogCompressor for build output, and so on.
  3. IntelligentContext scores every message by importance (recency, semantic relevance, error indicators) and drops the lowest-value messages to fit within the model's context window.

Content Type Detection

The router auto-detects content type by analyzing structure and patterns. No manual hints required.

Content TypeDetection SignalCompressorTypical Savings
JSON arraysValid JSON with array elementsSmartCrusher70-90%
Source codeSyntax patterns, indentation, keywordsCodeAwareCompressor40-70%
Search resultsfile:line:content formatSearchCompressor80-95%
Build/test logsTimestamps, log levels, pytest/npm markersLogCompressor85-95%
DiffsUnified diff formatDiffCompressor60-80%
HTMLTag structureHTMLCompressor50-70%
Plain textFallbackTextCompressor60-80%

Quick Start

import {  } from "headroom-ai";

const  = [
  { : "system" as , : "You are a helpful assistant." },
  { : "user" as , : "Summarize this data" },
  { : "tool" as , : '{"results": [...]}', : "call_1" },
];

const  = await ();
.(`Tokens saved: ${.tokensSaved}`);
.(`Compression ratio: ${.compressionRatio}`);
from headroom.compression import compress

result = compress(content)
print(result.compressed)
print(f"Saved {result.savings_percentage:.0f}% tokens")

Configuring the Compressor

import {  } from "headroom-ai";

const  = await (messages, {
  : "gpt-4o",
  : 50000,
});

.(`Before: ${.tokensBefore} tokens`);
.(`After: ${.tokensAfter} tokens`);
.(`Transforms: ${.transformsApplied.join(", ")}`);
from headroom.compression import UniversalCompressor, UniversalCompressorConfig

config = UniversalCompressorConfig(
    compression_ratio_target=0.5,  # Keep 50% of content
    use_entropy_preservation=True,  # Preserve UUIDs, hashes
    use_magika=True,                # ML-based content detection
    ccr_enabled=True,               # Store originals for retrieval
)

compressor = UniversalCompressor(config=config)
result = compressor.compress(content)

print(f"Type: {result.content_type}")
print(f"Handler: {result.handler_used}")
print(f"Saved: {result.savings_percentage:.0f}%")

Structure Preservation

Headroom doesn't blindly truncate. It identifies what matters in each content type and preserves it:

Content TypeWhat's PreservedWhat's Compressed
JSONKeys, brackets, booleans, nulls, short values, UUIDsLong string values, whitespace
CodeImports, function signatures, class definitions, typesFunction bodies, comments
LogsTimestamps, log levels, error messages, stack tracesRepeated patterns, verbose details
TextHigh-entropy tokens (IDs, hashes), headersLow-information content

Real Compression Ratios

Content TypeCompressionSpeedWhat's Preserved
JSON (large arrays)70-90%~1msAll keys, structure
Source code (Python)50-70%~10msSignatures, imports
Search results80-95%~2msRelevant matches
Build logs85-95%~3msErrors, stack traces
Plain text60-80%~5msHigh-entropy tokens

Batch Compression

For multiple contents, batch compression is more efficient:

from headroom.compression import UniversalCompressor

compressor = UniversalCompressor()

contents = [
    '{"users": [...]}',
    'def hello(): pass',
    'Plain text content',
]

results = compressor.compress_batch(contents)

for result in results:
    print(f"{result.content_type}: {result.savings_percentage:.0f}% saved")

What Happens Under the Hood

When you call compress(), here is the full sequence:

  1. Content detection -- Magika (ML-based) or pattern matching identifies the content type
  2. Structure extraction -- A handler extracts a structure mask marking what to preserve
  3. Compression -- Non-structural content is compressed (SmartCrusher, LLMLingua, or text utilities)
  4. CCR storage -- If enabled, the original is stored for retrieval when the LLM needs full context

Zero-config by default

The pipeline works out of the box with no configuration. All detection, routing, and compression happens automatically. Configuration is available when you need fine-grained control.

On this page