How Compression Works
Understand Headroom's three-stage compression pipeline, automatic content routing, and how different content types are compressed.
Headroom automatically detects what kind of content you're sending and routes it to the right compressor. You don't need to configure anything -- just call compress() and the pipeline handles the rest.
The Three-Stage Pipeline
Every request flows through three stages:
┌──────────────┐ ┌────────────────┐ ┌─────────────────────┐
│ CacheAligner │────>│ ContentRouter │────>│ IntelligentContext │
│ │ │ │ │ │
│ Stabilize │ │ Detect type & │ │ Score messages & │
│ prefix for │ │ route to best │ │ fit within token │
│ cache hits │ │ compressor │ │ budget │
└──────────────┘ └────────────────┘ └─────────────────────┘- CacheAligner extracts dynamic content (dates, user context) from your system prompt so the static prefix stays cacheable across requests.
- ContentRouter inspects each tool output and routes it to the optimal compressor -- SmartCrusher for JSON arrays, CodeAwareCompressor for source code, LogCompressor for build output, and so on.
- IntelligentContext scores every message by importance (recency, semantic relevance, error indicators) and drops the lowest-value messages to fit within the model's context window.
Content Type Detection
The router auto-detects content type by analyzing structure and patterns. No manual hints required.
| Content Type | Detection Signal | Compressor | Typical Savings |
|---|---|---|---|
| JSON arrays | Valid JSON with array elements | SmartCrusher | 70-90% |
| Source code | Syntax patterns, indentation, keywords | CodeAwareCompressor | 40-70% |
| Search results | file:line:content format | SearchCompressor | 80-95% |
| Build/test logs | Timestamps, log levels, pytest/npm markers | LogCompressor | 85-95% |
| Diffs | Unified diff format | DiffCompressor | 60-80% |
| HTML | Tag structure | HTMLCompressor | 50-70% |
| Plain text | Fallback | TextCompressor | 60-80% |
Quick Start
import { } from "headroom-ai";
const = [
{ : "system" as , : "You are a helpful assistant." },
{ : "user" as , : "Summarize this data" },
{ : "tool" as , : '{"results": [...]}', : "call_1" },
];
const = await ();
.(`Tokens saved: ${.tokensSaved}`);
.(`Compression ratio: ${.compressionRatio}`);from headroom.compression import compress
result = compress(content)
print(result.compressed)
print(f"Saved {result.savings_percentage:.0f}% tokens")Configuring the Compressor
import { } from "headroom-ai";
const = await (messages, {
: "gpt-4o",
: 50000,
});
.(`Before: ${.tokensBefore} tokens`);
.(`After: ${.tokensAfter} tokens`);
.(`Transforms: ${.transformsApplied.join(", ")}`);from headroom.compression import UniversalCompressor, UniversalCompressorConfig
config = UniversalCompressorConfig(
compression_ratio_target=0.5, # Keep 50% of content
use_entropy_preservation=True, # Preserve UUIDs, hashes
use_magika=True, # ML-based content detection
ccr_enabled=True, # Store originals for retrieval
)
compressor = UniversalCompressor(config=config)
result = compressor.compress(content)
print(f"Type: {result.content_type}")
print(f"Handler: {result.handler_used}")
print(f"Saved: {result.savings_percentage:.0f}%")Structure Preservation
Headroom doesn't blindly truncate. It identifies what matters in each content type and preserves it:
| Content Type | What's Preserved | What's Compressed |
|---|---|---|
| JSON | Keys, brackets, booleans, nulls, short values, UUIDs | Long string values, whitespace |
| Code | Imports, function signatures, class definitions, types | Function bodies, comments |
| Logs | Timestamps, log levels, error messages, stack traces | Repeated patterns, verbose details |
| Text | High-entropy tokens (IDs, hashes), headers | Low-information content |
Real Compression Ratios
| Content Type | Compression | Speed | What's Preserved |
|---|---|---|---|
| JSON (large arrays) | 70-90% | ~1ms | All keys, structure |
| Source code (Python) | 50-70% | ~10ms | Signatures, imports |
| Search results | 80-95% | ~2ms | Relevant matches |
| Build logs | 85-95% | ~3ms | Errors, stack traces |
| Plain text | 60-80% | ~5ms | High-entropy tokens |
Batch Compression
For multiple contents, batch compression is more efficient:
from headroom.compression import UniversalCompressor
compressor = UniversalCompressor()
contents = [
'{"users": [...]}',
'def hello(): pass',
'Plain text content',
]
results = compressor.compress_batch(contents)
for result in results:
print(f"{result.content_type}: {result.savings_percentage:.0f}% saved")What Happens Under the Hood
When you call compress(), here is the full sequence:
- Content detection -- Magika (ML-based) or pattern matching identifies the content type
- Structure extraction -- A handler extracts a structure mask marking what to preserve
- Compression -- Non-structural content is compressed (SmartCrusher, LLMLingua, or text utilities)
- CCR storage -- If enabled, the original is stored for retrieval when the LLM needs full context
Zero-config by default
The pipeline works out of the box with no configuration. All detection, routing, and compression happens automatically. Configuration is available when you need fine-grained control.