Strands
Context compression for Strands Agents via model wrapping and hook-based tool output compression.
Headroom integrates with Strands Agents through two patterns: wrap the model for full conversation compression, or hook into tool calls for targeted tool output compression.
Installation
pip install headroom-ai strands-agentsQuick start
from strands import Agent
from strands.models.bedrock import BedrockModel
from headroom.integrations.strands import HeadroomStrandsModel
model = BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514-v1:0")
optimized = HeadroomStrandsModel(wrapped_model=model)
agent = Agent(model=optimized)
response = agent("Investigate the production incident")
print(f"Tokens saved: {optimized.total_tokens_saved}")Model wrapping
Wraps the Strands Model interface. Every call to stream() compresses messages before they reach the provider:
from headroom import HeadroomConfig
from headroom.integrations.strands import HeadroomStrandsModel
optimized = HeadroomStrandsModel(
wrapped_model=model,
config=HeadroomConfig(),
)
agent = Agent(model=optimized)
response = agent("Analyze these logs")Hook provider (tool output compression)
Compresses tool call results via Strands' hook system. Uses SmartCrusher on JSON arrays returned by tools:
from strands import Agent
from strands.models.bedrock import BedrockModel
from headroom.integrations.strands import HeadroomHookProvider
model = BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514-v1:0")
hooks = HeadroomHookProvider(
compress_tool_outputs=True,
min_tokens_to_compress=200,
preserve_errors=True,
)
agent = Agent(model=model, hooks=[hooks])
response = agent("Search the database for recent failures")
print(f"Tokens saved by hooks: {hooks.total_tokens_saved}")The hook preserves error items, anomalous values (statistical outliers), items matching the query context, and boundary items (first/last).
Both together
Model wrapping compresses conversation history. Hooks compress individual tool results. Use both for maximum savings:
from headroom.integrations.strands import HeadroomStrandsModel, HeadroomHookProvider
optimized = HeadroomStrandsModel(wrapped_model=model)
hooks = HeadroomHookProvider(compress_tool_outputs=True)
agent = Agent(model=optimized, hooks=[hooks])How it works
Agent decides to call tool
|
v
Tool executes, returns result
|
v
HeadroomHookProvider (optional)
compresses tool result JSON
|
v
Agent builds next API request
|
v
HeadroomStrandsModel.stream()
compresses full message list
|
v
Provider API (Bedrock, etc.)The model wrapper uses the full Headroom pipeline (CacheAligner, ContentRouter, IntelligentContext). The hook provider uses SmartCrusher directly for fast JSON compression.
Structured output
from pydantic import BaseModel
class Analysis(BaseModel):
severity: str
root_cause: str
recommendation: str
result = optimized.structured_output(Analysis, messages)Metrics
for m in optimized.metrics_history:
print(f" {m.tokens_before} -> {m.tokens_after} ({m.tokens_saved} saved)")
print(f"Total saved: {optimized.total_tokens_saved}")Supported providers
| Strands Model | Provider Detected |
|---|---|
BedrockModel | Anthropic (via Bedrock) |
OllamaModel | OpenAI-compatible |
Custom Model | Falls back to estimation |