API Reference
Complete API reference for the Headroom Python and TypeScript SDKs. Core client, configuration types, result types, errors, and utilities.
Complete API reference for the Headroom Python and TypeScript SDKs.
Core
HeadroomClient
The main entry point for the Headroom SDK.
Prop
Type
import { } from 'headroom-ai';
const = new ({
: 'http://localhost:8787',
: 'your-api-key',
: 30_000,
: true,
: 2,
});Constructor Parameters
Prop
Type
from headroom import HeadroomClient, OpenAIProvider
from openai import OpenAI
client = HeadroomClient(
original_client=OpenAI(),
provider=OpenAIProvider(),
default_mode="optimize",
)chat.completions.create()
Create a chat completion with optional optimization.
The TypeScript SDK uses compress() to optimize messages before sending them to your LLM client:
import { } from 'headroom-ai';
const = await (messages, {
: 'gpt-4o',
: 100_000,
});
// Then pass result.messages to your LLM clientAccepts all standard OpenAI/Anthropic parameters plus Headroom-specific overrides:
Prop
Type
response = client.chat.completions.create(
model="gpt-4o",
messages=[...],
headroom_mode="optimize",
headroom_keep_turns=5,
headroom_tool_profiles={
"important_tool": {"skip_compression": True},
},
)chat.completions.simulate()
Preview optimization without making an API call.
plan = client.chat.completions.simulate(
model="gpt-4o",
messages=[...],
)
print(f"Tokens: {plan.tokens_before} -> {plan.tokens_after}")
print(f"Savings: {plan.savings_percent:.1f}%")
print(f"Transforms: {plan.transforms_applied}")Returns: SimulationResult
compress() (TypeScript)
Top-level function to compress messages via the Headroom proxy.
Prop
Type
import { } from 'headroom-ai';
const = await (messages, {
: 'gpt-4o',
: 'http://localhost:8787',
: 15_000,
: true,
: 2,
: 100_000,
});get_stats()
Quick stats for the current session (no database query).
stats = client.get_stats()
# Returns dict with "session", "config", and "transforms" keysget_metrics()
Query stored metrics from the database.
from datetime import datetime, timedelta
metrics = client.get_metrics(
start_time=datetime.utcnow() - timedelta(hours=1),
limit=100,
)get_summary()
Aggregate statistics across all stored metrics.
summary = client.get_summary()
# Returns dict with total_requests, total_tokens_saved,
# avg_compression_ratio, total_cost_saved_usdvalidate_setup()
Validate that the client is configured correctly.
result = client.validate_setup()
if not result["valid"]:
for issue in result["issues"]:
print(f" - {issue}")Configuration
SmartCrusherConfig
Prop
Type
Prop
Type
from headroom import SmartCrusherConfig
config = SmartCrusherConfig(
min_tokens_to_crush=200,
max_items_after_crush=50,
keep_first=3,
keep_last=2,
relevance_threshold=0.3,
anomaly_std_threshold=2.0,
preserve_errors=True,
)CacheAlignerConfig
Prop
Type
Prop
Type
from headroom import CacheAlignerConfig
config = CacheAlignerConfig(
enabled=True,
extract_dates=True,
normalize_whitespace=True,
stable_prefix_min_tokens=100,
)RollingWindowConfig
Prop
Type
Prop
Type
from headroom import RollingWindowConfig
config = RollingWindowConfig(
max_tokens=100000,
preserve_system=True,
preserve_recent_turns=5,
drop_oldest_first=True,
)IntelligentContextConfig
Prop
Type
Prop
Type
from headroom.config import IntelligentContextConfig, ScoringWeights
config = IntelligentContextConfig(
enabled=True,
keep_system=True,
keep_last_turns=2,
output_buffer_tokens=4000,
use_importance_scoring=True,
scoring_weights=ScoringWeights(),
toin_integration=True,
)ScoringWeights
Prop
Type
Prop
Type
Weights are automatically normalized to sum to 1.0.
HeadroomConfig
Prop
Type
The top-level config object that contains all sub-configurations:
from headroom import HeadroomConfig
config = HeadroomConfig()
config.smart_crusher.min_tokens_to_crush = 100
config.cache_aligner.enabled = True
config.rolling_window.preserve_recent_turns = 3RelevanceScorerConfig
Prop
Type
Prop
Type
Results
CompressResult (TypeScript)
Prop
Type
SimulationResult (Python)
Prop
Type
WasteSignals (Python)
Prop
Type
RequestMetrics (Python)
Prop
Type
Providers
OpenAIProvider
from headroom import OpenAIProvider
provider = OpenAIProvider(
enable_prefix_caching=True,
)
counter = provider.get_token_counter("gpt-4o")
tokens = counter.count_text("Hello, world!")
limit = provider.get_context_limit("gpt-4o") # 128000
cost = provider.estimate_cost(input_tokens=1000, output_tokens=500, model="gpt-4o")AnthropicProvider
from headroom import AnthropicProvider
from anthropic import Anthropic
provider = AnthropicProvider(
client=Anthropic(),
enable_cache_control=True,
)
counter = provider.get_token_counter("claude-3-5-sonnet-latest")
tokens = counter.count_messages(messages) # Accurate count via APIGoogleProvider
from headroom import GoogleProvider
provider = GoogleProvider(
enable_context_caching=True,
)Relevance Scoring
create_scorer()
Factory function to create scorers:
from headroom import create_scorer
# Auto-select best available scorer
scorer = create_scorer()
# Explicitly choose type
scorer = create_scorer(scorer_type="hybrid", alpha=0.7)BM25Scorer
Fast keyword-based scoring (zero dependencies):
from headroom import BM25Scorer
scorer = BM25Scorer()
scores = scorer.score_items(items=["item 1", "item 2"], query="search query")EmbeddingScorer
Semantic similarity scoring (requires headroom-ai[relevance]):
from headroom import EmbeddingScorer, embedding_available
if embedding_available():
scorer = EmbeddingScorer(model="all-MiniLM-L6-v2")
scores = scorer.score_items(items, query)HybridScorer
Combines BM25 and embeddings:
from headroom import HybridScorer
scorer = HybridScorer(alpha=0.5) # 50% BM25, 50% embedding
scores = scorer.score_items(items, query)Transforms (Direct Use)
SmartCrusher
from headroom import SmartCrusher
crusher = SmartCrusher()
result = crusher.crush(data={"results": [...]}, query="user query")CacheAligner
from headroom import CacheAligner
aligner = CacheAligner()
result = aligner.align(messages)RollingWindow
from headroom import RollingWindow
window = RollingWindow(config)
result = window.apply(messages, max_tokens=100000)IntelligentContextManager
from headroom.transforms import IntelligentContextManager
from headroom.config import IntelligentContextConfig
config = IntelligentContextConfig(
keep_system=True,
keep_last_turns=2,
use_importance_scoring=True,
)
manager = IntelligentContextManager(config, toin=toin)
result = manager.apply(messages, tokenizer, model_limit=128000)TransformPipeline
from headroom import TransformPipeline
pipeline = TransformPipeline([
SmartCrusher(),
CacheAligner(),
RollingWindow(),
])
result = pipeline.transform(messages)Errors
| Exception | Meaning |
|---|---|
HeadroomError | Base class for all errors |
HeadroomConnectionError | Cannot reach proxy |
HeadroomAuthError | 401 from proxy |
HeadroomCompressError | Compression failed (includes statusCode, errorType) |
ConfigurationError | Invalid configuration |
ProviderError | Provider issues |
StorageError | Storage failures |
TokenizationError | Token counting failed |
CacheError | Cache operations failed |
ValidationError | Validation failures |
TransformError | Transform execution failed |
Use mapProxyError(status, type, message) to convert proxy error responses to the correct class.
| Exception | Meaning |
|---|---|
HeadroomError | Base class for all Headroom errors |
ConfigurationError | Invalid config values |
ProviderError | Provider issue (unknown model, etc.) |
StorageError | Database issue |
CompressionError | Compression failed (rare) |
ValidationError | Setup validation failed |
All exceptions include a details dict with additional context.
Utilities
Tokenizer
from headroom import Tokenizer, count_tokens_text, count_tokens_messages
# Quick counting
tokens = count_tokens_text("Hello, world!", model="gpt-4o")
# With tokenizer instance
tokenizer = Tokenizer(model="gpt-4o")
tokens = tokenizer.count_text("Hello")
tokens = tokenizer.count_messages(messages)generate_report()
Generate HTML/Markdown reports from stored metrics:
from headroom import generate_report
report = generate_report(
store_url="sqlite:///headroom.db",
format="html",
period="day",
)TypeScript Message Types
Prop
Type
The TypeScript SDK uses the standard OpenAI message format with SystemMessage, UserMessage, AssistantMessage, and ToolMessage variants.