Headroom

API Reference

Complete API reference for the Headroom Python and TypeScript SDKs. Core client, configuration types, result types, errors, and utilities.

Complete API reference for the Headroom Python and TypeScript SDKs.

Core

HeadroomClient

The main entry point for the Headroom SDK.

Prop

Type

import {  } from 'headroom-ai';

const  = new ({
  : 'http://localhost:8787',
  : 'your-api-key',
  : 30_000,
  : true,
  : 2,
});

Constructor Parameters

Prop

Type

from headroom import HeadroomClient, OpenAIProvider
from openai import OpenAI

client = HeadroomClient(
    original_client=OpenAI(),
    provider=OpenAIProvider(),
    default_mode="optimize",
)

chat.completions.create()

Create a chat completion with optional optimization.

The TypeScript SDK uses compress() to optimize messages before sending them to your LLM client:

import {  } from 'headroom-ai';

const  = await (messages, {
  : 'gpt-4o',
  : 100_000,
});

// Then pass result.messages to your LLM client

Accepts all standard OpenAI/Anthropic parameters plus Headroom-specific overrides:

Prop

Type

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    headroom_mode="optimize",
    headroom_keep_turns=5,
    headroom_tool_profiles={
        "important_tool": {"skip_compression": True},
    },
)

chat.completions.simulate()

Preview optimization without making an API call.

plan = client.chat.completions.simulate(
    model="gpt-4o",
    messages=[...],
)

print(f"Tokens: {plan.tokens_before} -> {plan.tokens_after}")
print(f"Savings: {plan.savings_percent:.1f}%")
print(f"Transforms: {plan.transforms_applied}")

Returns: SimulationResult

compress() (TypeScript)

Top-level function to compress messages via the Headroom proxy.

Prop

Type

import {  } from 'headroom-ai';

const  = await (messages, {
  : 'gpt-4o',
  : 'http://localhost:8787',
  : 15_000,
  : true,
  : 2,
  : 100_000,
});

get_stats()

Quick stats for the current session (no database query).

stats = client.get_stats()
# Returns dict with "session", "config", and "transforms" keys

get_metrics()

Query stored metrics from the database.

from datetime import datetime, timedelta

metrics = client.get_metrics(
    start_time=datetime.utcnow() - timedelta(hours=1),
    limit=100,
)

get_summary()

Aggregate statistics across all stored metrics.

summary = client.get_summary()
# Returns dict with total_requests, total_tokens_saved,
# avg_compression_ratio, total_cost_saved_usd

validate_setup()

Validate that the client is configured correctly.

result = client.validate_setup()
if not result["valid"]:
    for issue in result["issues"]:
        print(f"  - {issue}")

Configuration

SmartCrusherConfig

Prop

Type

Prop

Type

from headroom import SmartCrusherConfig

config = SmartCrusherConfig(
    min_tokens_to_crush=200,
    max_items_after_crush=50,
    keep_first=3,
    keep_last=2,
    relevance_threshold=0.3,
    anomaly_std_threshold=2.0,
    preserve_errors=True,
)

CacheAlignerConfig

Prop

Type

Prop

Type

from headroom import CacheAlignerConfig

config = CacheAlignerConfig(
    enabled=True,
    extract_dates=True,
    normalize_whitespace=True,
    stable_prefix_min_tokens=100,
)

RollingWindowConfig

Prop

Type

Prop

Type

from headroom import RollingWindowConfig

config = RollingWindowConfig(
    max_tokens=100000,
    preserve_system=True,
    preserve_recent_turns=5,
    drop_oldest_first=True,
)

IntelligentContextConfig

Prop

Type

Prop

Type

from headroom.config import IntelligentContextConfig, ScoringWeights

config = IntelligentContextConfig(
    enabled=True,
    keep_system=True,
    keep_last_turns=2,
    output_buffer_tokens=4000,
    use_importance_scoring=True,
    scoring_weights=ScoringWeights(),
    toin_integration=True,
)

ScoringWeights

Prop

Type

Prop

Type

Weights are automatically normalized to sum to 1.0.

HeadroomConfig

Prop

Type

The top-level config object that contains all sub-configurations:

from headroom import HeadroomConfig

config = HeadroomConfig()
config.smart_crusher.min_tokens_to_crush = 100
config.cache_aligner.enabled = True
config.rolling_window.preserve_recent_turns = 3

RelevanceScorerConfig

Prop

Type

Prop

Type


Results

CompressResult (TypeScript)

Prop

Type

SimulationResult (Python)

Prop

Type

WasteSignals (Python)

Prop

Type

RequestMetrics (Python)

Prop

Type


Providers

OpenAIProvider

from headroom import OpenAIProvider

provider = OpenAIProvider(
    enable_prefix_caching=True,
)

counter = provider.get_token_counter("gpt-4o")
tokens = counter.count_text("Hello, world!")
limit = provider.get_context_limit("gpt-4o")  # 128000
cost = provider.estimate_cost(input_tokens=1000, output_tokens=500, model="gpt-4o")

AnthropicProvider

from headroom import AnthropicProvider
from anthropic import Anthropic

provider = AnthropicProvider(
    client=Anthropic(),
    enable_cache_control=True,
)

counter = provider.get_token_counter("claude-3-5-sonnet-latest")
tokens = counter.count_messages(messages)  # Accurate count via API

GoogleProvider

from headroom import GoogleProvider

provider = GoogleProvider(
    enable_context_caching=True,
)

Relevance Scoring

create_scorer()

Factory function to create scorers:

from headroom import create_scorer

# Auto-select best available scorer
scorer = create_scorer()

# Explicitly choose type
scorer = create_scorer(scorer_type="hybrid", alpha=0.7)

BM25Scorer

Fast keyword-based scoring (zero dependencies):

from headroom import BM25Scorer

scorer = BM25Scorer()
scores = scorer.score_items(items=["item 1", "item 2"], query="search query")

EmbeddingScorer

Semantic similarity scoring (requires headroom-ai[relevance]):

from headroom import EmbeddingScorer, embedding_available

if embedding_available():
    scorer = EmbeddingScorer(model="all-MiniLM-L6-v2")
    scores = scorer.score_items(items, query)

HybridScorer

Combines BM25 and embeddings:

from headroom import HybridScorer

scorer = HybridScorer(alpha=0.5)  # 50% BM25, 50% embedding
scores = scorer.score_items(items, query)

Transforms (Direct Use)

SmartCrusher

from headroom import SmartCrusher

crusher = SmartCrusher()
result = crusher.crush(data={"results": [...]}, query="user query")

CacheAligner

from headroom import CacheAligner

aligner = CacheAligner()
result = aligner.align(messages)

RollingWindow

from headroom import RollingWindow

window = RollingWindow(config)
result = window.apply(messages, max_tokens=100000)

IntelligentContextManager

from headroom.transforms import IntelligentContextManager
from headroom.config import IntelligentContextConfig

config = IntelligentContextConfig(
    keep_system=True,
    keep_last_turns=2,
    use_importance_scoring=True,
)

manager = IntelligentContextManager(config, toin=toin)
result = manager.apply(messages, tokenizer, model_limit=128000)

TransformPipeline

from headroom import TransformPipeline

pipeline = TransformPipeline([
    SmartCrusher(),
    CacheAligner(),
    RollingWindow(),
])

result = pipeline.transform(messages)

Errors

ExceptionMeaning
HeadroomErrorBase class for all errors
HeadroomConnectionErrorCannot reach proxy
HeadroomAuthError401 from proxy
HeadroomCompressErrorCompression failed (includes statusCode, errorType)
ConfigurationErrorInvalid configuration
ProviderErrorProvider issues
StorageErrorStorage failures
TokenizationErrorToken counting failed
CacheErrorCache operations failed
ValidationErrorValidation failures
TransformErrorTransform execution failed

Use mapProxyError(status, type, message) to convert proxy error responses to the correct class.

ExceptionMeaning
HeadroomErrorBase class for all Headroom errors
ConfigurationErrorInvalid config values
ProviderErrorProvider issue (unknown model, etc.)
StorageErrorDatabase issue
CompressionErrorCompression failed (rare)
ValidationErrorSetup validation failed

All exceptions include a details dict with additional context.


Utilities

Tokenizer

from headroom import Tokenizer, count_tokens_text, count_tokens_messages

# Quick counting
tokens = count_tokens_text("Hello, world!", model="gpt-4o")

# With tokenizer instance
tokenizer = Tokenizer(model="gpt-4o")
tokens = tokenizer.count_text("Hello")
tokens = tokenizer.count_messages(messages)

generate_report()

Generate HTML/Markdown reports from stored metrics:

from headroom import generate_report

report = generate_report(
    store_url="sqlite:///headroom.db",
    format="html",
    period="day",
)

TypeScript Message Types

Prop

Type

The TypeScript SDK uses the standard OpenAI message format with SystemMessage, UserMessage, AssistantMessage, and ToolMessage variants.

On this page