SharedContext
Compressed inter-agent context sharing. Reduce token usage by ~80% when agents hand off to each other.
When agents hand off to each other, context gets replayed in full. SharedContext compresses what moves between agents using Headroom's compression pipeline, typically saving ~80% of tokens on agent handoffs.
Quick Start
import { } from "headroom";
const = new ();
// Agent A stores large output
const = await .put("research", bigResearchOutput, {
: "researcher",
});
// Agent B gets compressed version (~80% smaller)
const = .get("research");
// Agent B needs full details on demand
const = .get("research", { : true });from headroom import SharedContext
ctx = SharedContext()
# Agent A stores large output
ctx.put("research", big_research_output, agent="researcher")
# Agent B gets compressed version (~80% smaller)
summary = ctx.get("research")
# Agent B needs full details on demand
full = ctx.get("research", full=True)API
put(key, content, agent?)
Store content under a key. Compresses automatically using Headroom's full pipeline (SmartCrusher for JSON, CodeCompressor for code, Kompress for text).
const = await .put("findings", bigJsonOutput, {
: "researcher",
});
.originalTokens; // 20000
.compressedTokens; // 4000
.savingsPercent; // 80.0
.transforms; // ["router:json:0.20"]entry = ctx.put("findings", big_json_output, agent="researcher")
entry.original_tokens # 20,000
entry.compressed_tokens # 4,000
entry.savings_percent # 80.0
entry.transforms # ["router:json:0.20"]get(key, full?)
Retrieve content. Returns the compressed version by default, or the original with full=True.
const = .get("findings"); // 4K tokens
const = .get("findings", { : true }); // 20K tokens
const = .get("nonexistent"); // nullcompressed = ctx.get("findings") # 4K tokens
original = ctx.get("findings", full=True) # 20K tokens
missing = ctx.get("nonexistent") # Nonestats()
Aggregated statistics across all entries.
const = .stats();
.entries; // 3
.totalOriginalTokens; // 60000
.totalCompressedTokens; // 12000
.totalTokensSaved; // 48000
.savingsPercent; // 80.0stats = ctx.stats()
stats.entries # 3
stats.total_original_tokens # 60000
stats.total_compressed_tokens # 12000
stats.total_tokens_saved # 48000
stats.savings_percent # 80.0keys() and clear()
keys() lists all non-expired keys. clear() removes all entries.
Configuration
const = new ({
: "claude-sonnet-4-5-20250929", // For token counting
: 3600, // 1 hour (default)
: 100, // Evicts oldest when full
});ctx = SharedContext(
model="claude-sonnet-4-5-20250929", # For token counting
ttl=3600, # 1 hour (default)
max_entries=100, # Evicts oldest when full
)Entries expire after ttl seconds. When maxEntries is reached, the oldest entry is evicted.
Framework Examples
SharedContext is framework-agnostic. It works anywhere context moves between agents.
CrewAI
from headroom import SharedContext
ctx = SharedContext()
# After researcher task completes
ctx.put("findings", researcher_task.output.raw)
# Coder task gets compressed context
coder_context = ctx.get("findings")LangGraph
from headroom import SharedContext
ctx = SharedContext()
def researcher_node(state):
result = do_research()
ctx.put("research", result)
return {"research_summary": ctx.get("research")}
def coder_node(state):
# Compressed summary in state, full details on demand
full = ctx.get("research", full=True)
return {"code": write_code(full)}OpenAI Agents SDK
from headroom import SharedContext
ctx = SharedContext()
def compress_handoff(messages):
for msg in messages:
if len(msg.content) > 1000:
ctx.put(msg.id, msg.content)
msg.content = ctx.get(msg.id)
return messages
handoff(agent=coder, input_filter=compress_handoff)How It Works
Under the hood, put() calls headroom.compress() -- the same pipeline used by the Headroom proxy -- and stores the original in memory. get() returns the compressed version. get(full=True) returns the original.
The compression pipeline routes content to the best compressor:
- JSON arrays -- SmartCrusher (70-95% compression)
- Code -- CodeCompressor (AST-aware)
- Text -- Kompress (ModernBERT-based) or passthrough
Persistent Memory
Hierarchical, temporal memory for LLM applications. Enable your AI to remember across conversations with intelligent scoping and versioning.
Failure Learning
Offline failure analysis for coding agents. Analyzes past sessions, finds what went wrong, correlates with what fixed it, and writes project-level learnings.