Headroom

SharedContext

Compressed inter-agent context sharing. Reduce token usage by ~80% when agents hand off to each other.

When agents hand off to each other, context gets replayed in full. SharedContext compresses what moves between agents using Headroom's compression pipeline, typically saving ~80% of tokens on agent handoffs.

Quick Start

import {  } from "headroom";

const  = new ();

// Agent A stores large output
const  = await .put("research", bigResearchOutput, {
  : "researcher",
});

// Agent B gets compressed version (~80% smaller)
const  = .get("research");

// Agent B needs full details on demand
const  = .get("research", { : true });
from headroom import SharedContext

ctx = SharedContext()

# Agent A stores large output
ctx.put("research", big_research_output, agent="researcher")

# Agent B gets compressed version (~80% smaller)
summary = ctx.get("research")

# Agent B needs full details on demand
full = ctx.get("research", full=True)

API

put(key, content, agent?)

Store content under a key. Compresses automatically using Headroom's full pipeline (SmartCrusher for JSON, CodeCompressor for code, Kompress for text).

const  = await .put("findings", bigJsonOutput, {
  : "researcher",
});

.originalTokens;   // 20000
.compressedTokens;  // 4000
.savingsPercent;    // 80.0
.transforms;        // ["router:json:0.20"]
entry = ctx.put("findings", big_json_output, agent="researcher")

entry.original_tokens     # 20,000
entry.compressed_tokens   # 4,000
entry.savings_percent     # 80.0
entry.transforms          # ["router:json:0.20"]

get(key, full?)

Retrieve content. Returns the compressed version by default, or the original with full=True.

const  = .get("findings");                  // 4K tokens
const  = .get("findings", { : true });    // 20K tokens
const  = .get("nonexistent");                  // null
compressed = ctx.get("findings")              # 4K tokens
original = ctx.get("findings", full=True)     # 20K tokens
missing = ctx.get("nonexistent")              # None

stats()

Aggregated statistics across all entries.

const  = .stats();
.entries;                // 3
.totalOriginalTokens;    // 60000
.totalCompressedTokens;  // 12000
.totalTokensSaved;       // 48000
.savingsPercent;         // 80.0
stats = ctx.stats()
stats.entries                  # 3
stats.total_original_tokens    # 60000
stats.total_compressed_tokens  # 12000
stats.total_tokens_saved       # 48000
stats.savings_percent          # 80.0

keys() and clear()

keys() lists all non-expired keys. clear() removes all entries.

Configuration

const  = new ({
  : "claude-sonnet-4-5-20250929",  // For token counting
  : 3600,                             // 1 hour (default)
  : 100,                       // Evicts oldest when full
});
ctx = SharedContext(
    model="claude-sonnet-4-5-20250929",  # For token counting
    ttl=3600,                             # 1 hour (default)
    max_entries=100,                       # Evicts oldest when full
)

Entries expire after ttl seconds. When maxEntries is reached, the oldest entry is evicted.

Framework Examples

SharedContext is framework-agnostic. It works anywhere context moves between agents.

CrewAI

from headroom import SharedContext

ctx = SharedContext()

# After researcher task completes
ctx.put("findings", researcher_task.output.raw)

# Coder task gets compressed context
coder_context = ctx.get("findings")

LangGraph

from headroom import SharedContext

ctx = SharedContext()

def researcher_node(state):
    result = do_research()
    ctx.put("research", result)
    return {"research_summary": ctx.get("research")}

def coder_node(state):
    # Compressed summary in state, full details on demand
    full = ctx.get("research", full=True)
    return {"code": write_code(full)}

OpenAI Agents SDK

from headroom import SharedContext

ctx = SharedContext()

def compress_handoff(messages):
    for msg in messages:
        if len(msg.content) > 1000:
            ctx.put(msg.id, msg.content)
            msg.content = ctx.get(msg.id)
    return messages

handoff(agent=coder, input_filter=compress_handoff)

How It Works

Under the hood, put() calls headroom.compress() -- the same pipeline used by the Headroom proxy -- and stores the original in memory. get() returns the compressed version. get(full=True) returns the original.

The compression pipeline routes content to the best compressor:

  • JSON arrays -- SmartCrusher (70-95% compression)
  • Code -- CodeCompressor (AST-aware)
  • Text -- Kompress (ModernBERT-based) or passthrough

On this page