Headroom

Strands

Context compression for Strands Agents via model wrapping and hook-based tool output compression.

Headroom integrates with Strands Agents through two patterns: wrap the model for full conversation compression, or hook into tool calls for targeted tool output compression.

Installation

pip install headroom-ai strands-agents

Quick start

from strands import Agent
from strands.models.bedrock import BedrockModel
from headroom.integrations.strands import HeadroomStrandsModel

model = BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514-v1:0")
optimized = HeadroomStrandsModel(wrapped_model=model)

agent = Agent(model=optimized)
response = agent("Investigate the production incident")

print(f"Tokens saved: {optimized.total_tokens_saved}")

Model wrapping

Wraps the Strands Model interface. Every call to stream() compresses messages before they reach the provider:

from headroom import HeadroomConfig
from headroom.integrations.strands import HeadroomStrandsModel

optimized = HeadroomStrandsModel(
    wrapped_model=model,
    config=HeadroomConfig(),
)

agent = Agent(model=optimized)
response = agent("Analyze these logs")

Hook provider (tool output compression)

Compresses tool call results via Strands' hook system. Uses SmartCrusher on JSON arrays returned by tools:

from strands import Agent
from strands.models.bedrock import BedrockModel
from headroom.integrations.strands import HeadroomHookProvider

model = BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514-v1:0")
hooks = HeadroomHookProvider(
    compress_tool_outputs=True,
    min_tokens_to_compress=200,
    preserve_errors=True,
)

agent = Agent(model=model, hooks=[hooks])
response = agent("Search the database for recent failures")

print(f"Tokens saved by hooks: {hooks.total_tokens_saved}")

The hook preserves error items, anomalous values (statistical outliers), items matching the query context, and boundary items (first/last).

Both together

Model wrapping compresses conversation history. Hooks compress individual tool results. Use both for maximum savings:

from headroom.integrations.strands import HeadroomStrandsModel, HeadroomHookProvider

optimized = HeadroomStrandsModel(wrapped_model=model)
hooks = HeadroomHookProvider(compress_tool_outputs=True)

agent = Agent(model=optimized, hooks=[hooks])

How it works

Agent decides to call tool
    |
    v
Tool executes, returns result
    |
    v
HeadroomHookProvider (optional)
    compresses tool result JSON
    |
    v
Agent builds next API request
    |
    v
HeadroomStrandsModel.stream()
    compresses full message list
    |
    v
Provider API (Bedrock, etc.)

The model wrapper uses the full Headroom pipeline (CacheAligner, ContentRouter, IntelligentContext). The hook provider uses SmartCrusher directly for fast JSON compression.

Structured output

from pydantic import BaseModel

class Analysis(BaseModel):
    severity: str
    root_cause: str
    recommendation: str

result = optimized.structured_output(Analysis, messages)

Metrics

for m in optimized.metrics_history:
    print(f"  {m.tokens_before} -> {m.tokens_after} ({m.tokens_saved} saved)")

print(f"Total saved: {optimized.total_tokens_saved}")

Supported providers

Strands ModelProvider Detected
BedrockModelAnthropic (via Bedrock)
OllamaModelOpenAI-compatible
Custom ModelFalls back to estimation

On this page