Headroom

Text & Log Compression

Specialized compressors for search results, build logs, diffs, and general text. Each preserves what matters for its content type.

Headroom provides specialized compressors for text-based content that isn't JSON or source code. Each one understands the structure of its content type and preserves what the LLM needs while dropping the noise.

CompressorInput TypeWhat It PreservesTypical Savings
SearchCompressorgrep/ripgrep outputRelevant matches, file diversity80-95%
LogCompressorBuild/test logsErrors, stack traces, summaries85-95%
DiffCompressorUnified diffsChanged lines, context60-80%
TextCompressorGeneral textRelevant paragraphs, anchors60-80%
LLMLinguaCompressorAny text (max compression)Semantic meaning via ML80-95%

SearchCompressor

Compresses search results (grep, ripgrep, ag) while keeping the matches that matter.

from headroom.transforms import SearchCompressor

search_results = """
src/utils.py:42:def process_data(items):
src/utils.py:43:    \"\"\"Process items.\"\"\"
src/models.py:15:class DataProcessor:
src/models.py:89:    def process(self, items):
... hundreds more matches ...
"""

compressor = SearchCompressor()
result = compressor.compress(search_results, context="find process")

print(f"Compressed {result.original_match_count} matches to {result.compressed_match_count}")
print(result.compressed)

What gets preserved:

  • Exact query matches (lines containing the search term)
  • High-relevance matches (scored by BM25 similarity)
  • File diversity (results from different files are kept)
  • First/last matches (context from start and end)

Configuration

from headroom.transforms import SearchCompressor, SearchCompressorConfig

config = SearchCompressorConfig(
    max_results=50,                # Keep up to 50 matches
    preserve_file_diversity=True,  # Ensure different files represented
    relevance_threshold=0.3,       # Minimum relevance score to keep
)

compressor = SearchCompressor(config)

LogCompressor

Compresses build and test output while preserving errors, warnings, and summaries.

from headroom.transforms import LogCompressor

build_output = """
===== test session starts =====
collected 500 items
tests/test_foo.py::test_1 PASSED
... hundreds of passed tests ...
tests/test_bar.py::test_fail FAILED
AssertionError: expected 5, got 3
===== 1 failed, 499 passed =====
"""

compressor = LogCompressor()
result = compressor.compress(build_output)

print(result.compressed)
print(f"Compression ratio: {result.compression_ratio:.1%}")

What gets preserved:

  • Errors and failures (any line with ERROR, FAILED, Exception)
  • Warnings
  • Full stack traces for debugging
  • Test/build summary lines
  • Section headers (structural markers like =====)

What gets dropped:

  • Hundreds of PASSED lines
  • Verbose success output
  • Repeated patterns

DiffCompressor

Compresses unified diffs while keeping the actual changes and enough context to understand them.

from headroom.transforms import DiffCompressor

diff_output = """
diff --git a/src/main.py b/src/main.py
--- a/src/main.py
+++ b/src/main.py
@@ -42,7 +42,7 @@
 def process(items):
-    return [x for x in items]
+    return [x.strip() for x in items if x]
"""

compressor = DiffCompressor()
result = compressor.compress(diff_output)

TextCompressor

General-purpose text compression with anchor preservation. Best for documentation, README files, and prose content.

from headroom.transforms import TextCompressor

long_text = """
... thousands of lines of documentation ...
"""

compressor = TextCompressor()
result = compressor.compress(long_text, context="authentication")

print(result.compressed)

What gets preserved:

  • Paragraphs relevant to the context query
  • Headers and section markers
  • Document structure and organization

LLMLingua (Optional, Maximum Compression)

For maximum compression on any text, Headroom integrates with Microsoft's LLMLingua-2, a BERT-based token classifier trained via GPT-4 distillation. It achieves up to 20x compression while preserving semantic meaning.

from headroom.transforms import LLMLinguaCompressor, LLMLinguaConfig

config = LLMLinguaConfig(
    device="auto",                # auto, cuda, cpu, mps
    code_compression_rate=0.4,    # Conservative for code
    json_compression_rate=0.35,   # Moderate for JSON
    text_compression_rate=0.25,   # Aggressive for text
)

compressor = LLMLinguaCompressor(config)
result = compressor.compress(long_output)

print(f"Before: {result.original_tokens} tokens")
print(f"After: {result.compressed_tokens} tokens")
print(f"Saved: {result.savings_percentage:.1f}%")

LLMLingua is opt-in

LLMLingua adds ~2GB of model weights and 50-200ms latency per request. Install it only when you need maximum compression: pip install "headroom-ai[llmlingua]"

Memory Management

from headroom.transforms import unload_llmlingua_model, is_llmlingua_model_loaded

# Check if model is loaded
print(is_llmlingua_model_loaded())  # True

# Free ~1GB RAM when done
unload_llmlingua_model()

Content Type Detection

If you're building your own routing logic, you can use the content type detector directly:

from headroom.transforms import detect_content_type, ContentType

content = "src/main.py:42:def process():"

detection = detect_content_type(content)
if detection.content_type == ContentType.SEARCH_RESULTS:
    result = SearchCompressor().compress(content, context="process")
elif detection.content_type == ContentType.BUILD_OUTPUT:
    result = LogCompressor().compress(content)
elif detection.content_type == ContentType.PLAIN_TEXT:
    result = TextCompressor().compress(content, context="process")

When Each Compressor Is Used

The ContentRouter selects the right compressor automatically. Here's when each fires:

Content PatternCompressorDetection Signal
file:line:content linesSearchCompressorgrep/ripgrep output format
pytest, npm, cargo markersLogCompressorBuild tool output patterns
---/+++ and @@ markersDiffCompressorUnified diff format
Prose, documentationTextCompressorFallback for non-structured text
Any (max compression mode)LLMLinguaCompressorExplicitly enabled

Performance

CompressorTypical InputOutputSpeed
SearchCompressor1,000 matches30-50 matches~2ms
LogCompressor5,000 lines100-200 lines~3ms
DiffCompressorLarge diffChanged hunks only~2ms
TextCompressor10,000 chars2,000 chars~2ms
LLMLinguaCompressorAny text5-20% of original50-200ms

On this page