# Docs - **Getting Started** - [Introduction](/docs): Headroom is the context optimization layer for LLM applications. Compress tool outputs, DB results, file reads, and RAG results before they reach the model. Same answers, fraction of the tokens. - [Quickstart](/docs/quickstart): Get Headroom running in 5 minutes. Install, compress, and send to your LLM with fewer tokens. - [Installation](/docs/installation): Install Headroom via pip, npm, or Docker. Includes all Python extras, TypeScript setup, Docker image tags, and environment variables. - [Community Savings](/docs/community-savings): Aggregate savings from Headroom instances across the community. Anonymous telemetry data — no prompts, no content, no PII. - **Compression** - [How Compression Works](/docs/how-compression-works): Understand Headroom's three-stage compression pipeline, automatic content routing, and how different content types are compressed. - [SmartCrusher](/docs/smart-crusher): Statistical JSON and array compression that keeps important items and drops the rest, achieving 70-90% token reduction. - [Code Compression](/docs/code-compression): AST-aware compression that preserves imports, signatures, and types while compressing function bodies. Powered by tree-sitter. - [Image Compression](/docs/image-compression): ML-powered image compression that reduces vision model token usage by 40-90% while maintaining answer accuracy. - [Text & Log Compression](/docs/text-and-logs): Specialized compressors for search results, build logs, diffs, and general text. Each preserves what matters for its content type. - **Reversible Compression** - [Reversible Compression (CCR)](/docs/ccr): Compress-Cache-Retrieve architecture that makes compression lossless — the LLM can always get the original data back. - **Cache & Context** - [Cache Optimization](/docs/cache-optimization): Stabilize message prefixes for provider KV cache hits and configure provider-specific caching strategies. - [Context Management](/docs/context-management): Intelligent importance-based context management that scores messages by learned patterns, with rolling window fallback and output buffer reservation. - **Memory** - [Persistent Memory](/docs/memory): Hierarchical, temporal memory for LLM applications. Enable your AI to remember across conversations with intelligent scoping and versioning. - [SharedContext](/docs/shared-context): Compressed inter-agent context sharing. Reduce token usage by ~80% when agents hand off to each other. - [Failure Learning](/docs/failure-learning): Offline failure analysis for coding agents. Analyzes past sessions, finds what went wrong, correlates with what fixed it, and writes project-level learnings. - **Proxy Server** - [Proxy Server](/docs/proxy): Run the Headroom proxy to compress LLM traffic for any client — Claude Code, Cursor, OpenAI SDK, or custom apps. - **Integrations** - [Vercel AI SDK](/docs/vercel-ai-sdk): Compress LLM context with the Vercel AI SDK using middleware, withHeadroom(), or standalone compression. - [OpenAI SDK](/docs/openai-sdk): Auto-compress messages in the OpenAI Node.js SDK with a single withHeadroom() wrapper. - [Anthropic SDK](/docs/anthropic-sdk): Auto-compress messages in the Anthropic TypeScript SDK with a single withHeadroom() wrapper. - [LangChain](/docs/langchain): Automatic context compression for LangChain chat models, memory, retrievers, and agents. - [Agno](/docs/agno): Automatic context compression for Agno AI agents with model wrapping and observability hooks. - [Strands](/docs/strands): Context compression for Strands Agents via model wrapping and hook-based tool output compression. - [LiteLLM](/docs/litellm): Add Headroom compression to LiteLLM with a single callback. Works with all 100+ supported providers. - [MCP Tools](/docs/mcp): Compression, retrieval, and stats as MCP tools for Claude Code, Cursor, and any MCP-compatible host. - **Configuration** - [Configuration](/docs/configuration): All configuration options for the Headroom Python and TypeScript SDKs, proxy server, and per-request overrides. - **Observability** - [Metrics & Monitoring](/docs/metrics): Monitor compression performance, cost savings, and system health with Headroom's built-in metrics, Prometheus endpoint, and SDK APIs. - [Simulation](/docs/simulation): Preview compression results without making an LLM call. Use simulation for cost estimation, debugging, and understanding waste signals. - **API Reference** - [API Reference](/docs/api-reference): Complete API reference for the Headroom Python and TypeScript SDKs. Core client, configuration types, result types, errors, and utilities. - **Architecture** - [Architecture](/docs/architecture): How Headroom's three-stage compression pipeline works, from message parsing through transform execution to provider cache optimization. - [Benchmarks](/docs/benchmarks): Compression performance, accuracy preservation, latency overhead, and real-world production telemetry from 250+ Headroom proxy instances. - [Limitations](/docs/limitations): When Headroom helps, when it does not, and what to watch out for. Honest documentation of compression constraints and safety gates. - **Help** - [Error Handling](/docs/errors): How to catch and handle Headroom errors in Python and TypeScript. Error hierarchy, proxy error mapping, and safety guarantees. - [Troubleshooting](/docs/troubleshooting): Solutions for common Headroom issues including proxy startup, connection errors, no token savings, high latency, and installation problems.