Vercel AI SDK

Compress LLM context with the Vercel AI SDK using middleware, withHeadroom(), or standalone compression.

Headroom integrates with the Vercel AI SDK through three patterns: a one-liner wrapper, composable middleware, and standalone message compression.

Installation

npm install headroom-ai ai @ai-sdk/openai

Proxy required

The TypeScript SDK sends messages to a local Headroom proxy for compression. Start the proxy before using the SDK:

pip install "headroom-ai[proxy]"
headroom proxy

withHeadroom() one-liner

The simplest integration. Wraps any Vercel AI SDK language model with automatic compression:

import {  } from 'headroom-ai/vercel-ai';
import {  } from '@ai-sdk/openai';
import {  } from 'ai';

const  = (('gpt-4o'));

const {  } = await ({
  ,
  : [
    { : 'user', : 'Summarize these results...' },
  ],
});

withHeadroom() calls wrapLanguageModel + headroomMiddleware() under the hood. It works with any provider (@ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google, etc.).

headroomMiddleware() for composition

Use the middleware directly when you need to compose it with other middleware:

import {  } from 'headroom-ai/vercel-ai';
import {  } from 'ai';
import {  } from '@ai-sdk/openai';

const  = ({
  : ('gpt-4o'),
  : (),
});

Pass options to control compression behavior:

import {  } from 'headroom-ai/vercel-ai';

const  = ({
  : 'gpt-4o',
  : 'http://localhost:8787',
});

compressVercelMessages() standalone

Compress Vercel-format messages directly without wrapping a model. Useful for custom pipelines:

import {  } from 'headroom-ai/vercel-ai';

const  = await (messages, {
  : 'gpt-4o',
});

.(`Saved ${.tokensSaved} tokens`);
// result.messages is in Vercel format, ready for the AI SDK

Streaming with streamText

Compression happens before the request. Streaming responses are unaffected:

import {  } from 'headroom-ai/vercel-ai';
import {  } from '@ai-sdk/openai';
import {  } from 'ai';

const  = (('gpt-4o'));

const  = ({
  ,
  : longConversation,
});

for await (const  of .) {
  ..();
}

generateObject with compressed context

Works with structured output:

import {  } from 'headroom-ai/vercel-ai';
import {  } from '@ai-sdk/openai';
import { , Output } from 'ai';
import {  } from 'zod';

const  = (('gpt-4o'));

const {  } = await ({
  ,
  : Output.({
    : .({
      : .(),
      : .(['low', 'medium', 'high']),
    }),
  }),
  : largeConversationHistory,
});

How it works

Messages are converted from Vercel format to OpenAI format
Headroom compresses them via the proxy's /v1/compress endpoint
Compressed messages are converted back to Vercel format
The original model receives the smaller prompt

All other model behavior (tool calling, structured output, streaming) is unchanged.