Headroom

Configuration

All configuration options for the Headroom Python and TypeScript SDKs, proxy server, and per-request overrides.

Headroom can be configured via the SDK constructor, proxy command line, environment variables, or per-request overrides.

Modes

ModeBehaviorUse Case
auditObserves and logs, no modificationsProduction monitoring, baseline measurement
optimizeApplies safe, deterministic transformsProduction optimization
simulateReturns plan without API callTesting, cost estimation

SDK Configuration

import {  } from 'headroom-ai';

// Reads from HEADROOM_BASE_URL and HEADROOM_API_KEY automatically
const  = new ();

// Or configure explicitly
const  = new ({
  : 'http://localhost:8787',
  : 'your-api-key',
  : 30_000,
  : true,
  : 2,
});
from headroom import HeadroomClient, OpenAIProvider
from openai import OpenAI

client = HeadroomClient(
    original_client=OpenAI(),
    provider=OpenAIProvider(),

    # Mode: "audit" (observe only) or "optimize" (apply transforms)
    default_mode="optimize",

    # Enable provider-specific cache optimization
    enable_cache_optimizer=True,

    # Enable query-level semantic caching
    enable_semantic_cache=False,

    # Override default context limits per model
    model_context_limits={
        "gpt-4o": 128000,
        "gpt-4o-mini": 128000,
    },

    # Database location (defaults to temp directory)
    # store_url="sqlite:////absolute/path/to/headroom.db",
)

Per-Request Overrides

Override configuration for individual requests:

import {  } from 'headroom-ai';

const  = await (messages, {
  : 'gpt-4o',
  : 100_000,
  : 15_000,
});
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],

    # Override mode for this request
    headroom_mode="audit",

    # Reserve more tokens for output
    headroom_output_buffer_tokens=8000,

    # Keep last N turns (don't compress)
    headroom_keep_turns=5,

    # Skip compression for specific tools
    headroom_tool_profiles={
        "important_tool": {"skip_compression": True}
    },
)

SmartCrusher Configuration

Fine-tune JSON compression behavior:

from headroom.transforms import SmartCrusherConfig

config = SmartCrusherConfig(
    # Maximum items to keep after compression
    max_items_after_crush=15,

    # Minimum tokens before applying compression
    min_tokens_to_crush=200,

    # Relevance scoring tier: "bm25" (fast) or "embedding" (accurate)
    relevance_tier="bm25",

    # Always keep items with these field values
    preserve_fields=["error", "warning", "failure"],
)

CacheAligner Configuration

Control prefix stabilization for provider cache hit rates:

from headroom.transforms import CacheAlignerConfig

config = CacheAlignerConfig(
    # Enable/disable cache alignment
    enabled=True,

    # Patterns to extract from system prompt
    dynamic_patterns=[
        r"Today is \w+ \d+, \d{4}",
        r"Current time: .*",
    ],
)

RollingWindow Configuration

Control context window management when messages exceed model limits:

from headroom.transforms import RollingWindowConfig

config = RollingWindowConfig(
    # Minimum turns to always keep
    min_keep_turns=3,

    # Reserve tokens for output
    output_buffer_tokens=4000,

    # Drop oldest tool outputs first
    prefer_drop_tool_outputs=True,
)

IntelligentContext Configuration

Semantic-aware context management with importance scoring:

from headroom.config import IntelligentContextConfig, ScoringWeights

# Customize scoring weights (must sum to 1.0, or will be normalized)
weights = ScoringWeights(
    recency=0.20,              # Newer messages score higher
    semantic_similarity=0.20,  # Similarity to recent context
    toin_importance=0.25,      # TOIN-learned retrieval patterns
    error_indicator=0.15,      # TOIN-learned error field types
    forward_reference=0.15,    # Messages referenced by later messages
    token_density=0.05,        # Information density
)

config = IntelligentContextConfig(
    enabled=True,
    keep_system=True,           # Never drop system messages
    keep_last_turns=2,          # Protect last N user turns
    output_buffer_tokens=4000,  # Reserve for model output
    use_importance_scoring=True,
    scoring_weights=weights,
    toin_integration=True,      # Use TOIN patterns if available
    recency_decay_rate=0.1,     # Exponential decay lambda
    compress_threshold=0.1,     # Try compression first if <10% over budget
)

Scoring Weights

Prop

Type

Weights are automatically normalized to sum to 1.0:

weights = ScoringWeights(recency=1.0, toin_importance=1.0)
normalized = weights.normalized()
# recency=0.5, toin_importance=0.5, others=0.0

Proxy Configuration

Command Line Options

headroom proxy \
  --port 8787 \              # Port to listen on
  --host 0.0.0.0 \           # Host to bind to
  --budget 10.00 \           # Daily budget limit in USD
  --log-file headroom.jsonl  # Log file path

Feature Flags

# Disable optimization (passthrough mode)
headroom proxy --no-optimize

# Disable semantic caching
headroom proxy --no-cache

# Enable LLMLingua ML compression
headroom proxy --llmlingua
headroom proxy --llmlingua --llmlingua-device cuda --llmlingua-rate 0.4

Environment Variables

VariableDescriptionDefault
HEADROOM_LOG_LEVELLogging levelINFO
HEADROOM_STORE_URLDatabase URLtemp directory
HEADROOM_DEFAULT_MODEDefault modeoptimize
HEADROOM_MODEL_LIMITSCustom model config (JSON string or file path)--
HEADROOM_BASE_URLBase URL of the Headroom proxy (TypeScript SDK)http://localhost:8787
HEADROOM_API_KEYAPI key for Headroom Cloud authentication--
HEADROOM_SAVINGS_PATHOverride persistent savings file location~/.headroom/proxy_savings.json
HEADROOM_TELEMETRYSet to off to disable anonymous telemetryon

Custom Model Configuration

Configure context limits and pricing for new or custom models:

{
  "anthropic": {
    "context_limits": {
      "claude-4-opus-20250301": 200000,
      "claude-custom-finetune": 128000
    },
    "pricing": {
      "claude-4-opus-20250301": {
        "input": 15.00,
        "output": 75.00,
        "cached_input": 1.50
      }
    }
  },
  "openai": {
    "context_limits": {
      "gpt-5": 256000,
      "ft:gpt-4o:my-org": 128000
    }
  }
}

Save as ~/.headroom/models.json, or set HEADROOM_MODEL_LIMITS to a JSON string or file path.

Settings are resolved in this order (later overrides earlier):

  1. Built-in defaults
  2. ~/.headroom/models.json config file
  3. HEADROOM_MODEL_LIMITS environment variable
  4. SDK constructor arguments

Pattern-Based Inference

Unknown models are automatically inferred from naming patterns:

PatternInferred Settings
*opus*200K context, Opus-tier pricing
*sonnet*200K context, Sonnet-tier pricing
*haiku*200K context, Haiku-tier pricing
gpt-4o*128K context, GPT-4o pricing
o1*, o3*200K context, reasoning model pricing

Provider-Specific Settings

from headroom import OpenAIProvider

provider = OpenAIProvider(
    enable_prefix_caching=True,
)
from headroom import AnthropicProvider

provider = AnthropicProvider(
    enable_cache_control=True,
)
from headroom import GoogleProvider

provider = GoogleProvider(
    enable_context_caching=True,
)

Tool Profiles

Skip or customize compression for specific tools:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    headroom_tool_profiles={
        "important_tool": {"skip_compression": True},
        "search_tool": {"max_items_after_crush": 25},
    },
)

Configuration Precedence

Settings are applied in this order (later overrides earlier):

  1. Default values
  2. Environment variables
  3. SDK constructor arguments
  4. Per-request overrides

Validation

Validate your configuration at startup:

result = client.validate_setup()

if not result["valid"]:
    print("Configuration issues:")
    for issue in result["issues"]:
        print(f"  - {issue}")

On this page