Vercel AI SDK
Compress LLM context with the Vercel AI SDK using middleware, withHeadroom(), or standalone compression.
Headroom integrates with the Vercel AI SDK through three patterns: a one-liner wrapper, composable middleware, and standalone message compression.
Installation
npm install headroom-ai ai @ai-sdk/openaiProxy required
The TypeScript SDK sends messages to a local Headroom proxy for compression. Start the proxy before using the SDK:
pip install "headroom-ai[proxy]"
headroom proxywithHeadroom() one-liner
The simplest integration. Wraps any Vercel AI SDK language model with automatic compression:
import { } from 'headroom-ai/vercel-ai';
import { } from '@ai-sdk/openai';
import { } from 'ai';
const = (('gpt-4o'));
const { } = await ({
,
: [
{ : 'user', : 'Summarize these results...' },
],
});withHeadroom() calls wrapLanguageModel + headroomMiddleware() under the hood. It works with any provider (@ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google, etc.).
headroomMiddleware() for composition
Use the middleware directly when you need to compose it with other middleware:
import { } from 'headroom-ai/vercel-ai';
import { } from 'ai';
import { } from '@ai-sdk/openai';
const = ({
: ('gpt-4o'),
: (),
});Pass options to control compression behavior:
import { } from 'headroom-ai/vercel-ai';
const = ({
: 'gpt-4o',
: 'http://localhost:8787',
});compressVercelMessages() standalone
Compress Vercel-format messages directly without wrapping a model. Useful for custom pipelines:
import { } from 'headroom-ai/vercel-ai';
const = await (messages, {
: 'gpt-4o',
});
.(`Saved ${.tokensSaved} tokens`);
// result.messages is in Vercel format, ready for the AI SDKStreaming with streamText
Compression happens before the request. Streaming responses are unaffected:
import { } from 'headroom-ai/vercel-ai';
import { } from '@ai-sdk/openai';
import { } from 'ai';
const = (('gpt-4o'));
const = ({
,
: longConversation,
});
for await (const of .) {
..();
}generateObject with compressed context
Works with structured output:
import { } from 'headroom-ai/vercel-ai';
import { } from '@ai-sdk/openai';
import { , Output } from 'ai';
import { } from 'zod';
const = (('gpt-4o'));
const { } = await ({
,
: Output.({
: .({
: .(),
: .(['low', 'medium', 'high']),
}),
}),
: largeConversationHistory,
});How it works
- Messages are converted from Vercel format to OpenAI format
- Headroom compresses them via the proxy's
/v1/compressendpoint - Compressed messages are converted back to Vercel format
- The original model receives the smaller prompt
All other model behavior (tool calling, structured output, streaming) is unchanged.