# Docs

- **Getting Started**

- [Introduction](/docs): Headroom is the context optimization layer for LLM applications. Compress tool outputs, DB results, file reads, and RAG results before they reach the model. Same answers, fraction of the tokens.
- [Quickstart](/docs/quickstart): Get Headroom running in 5 minutes. Install, compress, and send to your LLM with fewer tokens.
- [Installation](/docs/installation): Install Headroom via pip, npm, or Docker. Includes all Python extras, TypeScript setup, Docker image tags, and environment variables.
- [Community Savings](/docs/community-savings): Aggregate savings from Headroom instances across the community. Anonymous telemetry data — no prompts, no content, no PII.
- **Compression**

- [How Compression Works](/docs/how-compression-works): Understand Headroom's three-stage compression pipeline, automatic content routing, and how different content types are compressed.
- [SmartCrusher](/docs/smart-crusher): Statistical JSON and array compression that keeps important items and drops the rest, achieving 70-90% token reduction.
- [Code Compression](/docs/code-compression): AST-aware compression that preserves imports, signatures, and types while compressing function bodies. Powered by tree-sitter.
- [Image Compression](/docs/image-compression): ML-powered image compression that reduces vision model token usage by 40-90% while maintaining answer accuracy.
- [Text & Log Compression](/docs/text-and-logs): Specialized compressors for search results, build logs, diffs, and general text. Each preserves what matters for its content type.
- **Reversible Compression**

- [Reversible Compression (CCR)](/docs/ccr): Compress-Cache-Retrieve architecture that makes compression lossless — the LLM can always get the original data back.
- **Cache & Context**

- [Cache Optimization](/docs/cache-optimization): Stabilize message prefixes for provider KV cache hits and configure provider-specific caching strategies.
- [Context Management](/docs/context-management): Intelligent importance-based context management that scores messages by learned patterns, with rolling window fallback and output buffer reservation.
- **Memory**

- [Persistent Memory](/docs/memory): Hierarchical, temporal memory for LLM applications. Enable your AI to remember across conversations with intelligent scoping and versioning.
- [SharedContext](/docs/shared-context): Compressed inter-agent context sharing. Reduce token usage by ~80% when agents hand off to each other.
- [Failure Learning](/docs/failure-learning): Offline failure analysis for coding agents. Analyzes past sessions, finds what went wrong, correlates with what fixed it, and writes project-level learnings.
- **Proxy Server**

- [Proxy Server](/docs/proxy): Run the Headroom proxy to compress LLM traffic for any client — Claude Code, Cursor, OpenAI SDK, or custom apps.
- **Integrations**

- [Vercel AI SDK](/docs/vercel-ai-sdk): Compress LLM context with the Vercel AI SDK using middleware, withHeadroom(), or standalone compression.
- [OpenAI SDK](/docs/openai-sdk): Auto-compress messages in the OpenAI Node.js SDK with a single withHeadroom() wrapper.
- [Anthropic SDK](/docs/anthropic-sdk): Auto-compress messages in the Anthropic TypeScript SDK with a single withHeadroom() wrapper.
- [LangChain](/docs/langchain): Automatic context compression for LangChain chat models, memory, retrievers, and agents.
- [Agno](/docs/agno): Automatic context compression for Agno AI agents with model wrapping and observability hooks.
- [Strands](/docs/strands): Context compression for Strands Agents via model wrapping and hook-based tool output compression.
- [LiteLLM](/docs/litellm): Add Headroom compression to LiteLLM with a single callback. Works with all 100+ supported providers.
- [MCP Tools](/docs/mcp): Compression, retrieval, and stats as MCP tools for Claude Code, Cursor, and any MCP-compatible host.
- **Configuration**

- [Configuration](/docs/configuration): All configuration options for the Headroom Python and TypeScript SDKs, proxy server, and per-request overrides.
- **Observability**

- [Metrics & Monitoring](/docs/metrics): Monitor compression performance, cost savings, and system health with Headroom's built-in metrics, Prometheus endpoint, and SDK APIs.
- [Simulation](/docs/simulation): Preview compression results without making an LLM call. Use simulation for cost estimation, debugging, and understanding waste signals.
- **API Reference**

- [API Reference](/docs/api-reference): Complete API reference for the Headroom Python and TypeScript SDKs. Core client, configuration types, result types, errors, and utilities.
- **Architecture**

- [Architecture](/docs/architecture): How Headroom's three-stage compression pipeline works, from message parsing through transform execution to provider cache optimization.
- [Benchmarks](/docs/benchmarks): Compression performance, accuracy preservation, latency overhead, and real-world production telemetry from 250+ Headroom proxy instances.
- [Limitations](/docs/limitations): When Headroom helps, when it does not, and what to watch out for. Honest documentation of compression constraints and safety gates.
- **Help**

- [Error Handling](/docs/errors): How to catch and handle Headroom errors in Python and TypeScript. Error hierarchy, proxy error mapping, and safety guarantees.
- [Troubleshooting](/docs/troubleshooting): Solutions for common Headroom issues including proxy startup, connection errors, no token savings, high latency, and installation problems.