Headroom

Vercel AI SDK

Compress LLM context with the Vercel AI SDK using middleware, withHeadroom(), or standalone compression.

Headroom integrates with the Vercel AI SDK through three patterns: a one-liner wrapper, composable middleware, and standalone message compression.

Installation

npm install headroom-ai ai @ai-sdk/openai

Proxy required

The TypeScript SDK sends messages to a local Headroom proxy for compression. Start the proxy before using the SDK:

pip install "headroom-ai[proxy]"
headroom proxy

withHeadroom() one-liner

The simplest integration. Wraps any Vercel AI SDK language model with automatic compression:

import {  } from 'headroom-ai/vercel-ai';
import {  } from '@ai-sdk/openai';
import {  } from 'ai';

const  = (('gpt-4o'));

const {  } = await ({
  ,
  : [
    { : 'user', : 'Summarize these results...' },
  ],
});

withHeadroom() calls wrapLanguageModel + headroomMiddleware() under the hood. It works with any provider (@ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google, etc.).

headroomMiddleware() for composition

Use the middleware directly when you need to compose it with other middleware:

import {  } from 'headroom-ai/vercel-ai';
import {  } from 'ai';
import {  } from '@ai-sdk/openai';

const  = ({
  : ('gpt-4o'),
  : (),
});

Pass options to control compression behavior:

import {  } from 'headroom-ai/vercel-ai';

const  = ({
  : 'gpt-4o',
  : 'http://localhost:8787',
});

compressVercelMessages() standalone

Compress Vercel-format messages directly without wrapping a model. Useful for custom pipelines:

import {  } from 'headroom-ai/vercel-ai';

const  = await (messages, {
  : 'gpt-4o',
});

.(`Saved ${.tokensSaved} tokens`);
// result.messages is in Vercel format, ready for the AI SDK

Streaming with streamText

Compression happens before the request. Streaming responses are unaffected:

import {  } from 'headroom-ai/vercel-ai';
import {  } from '@ai-sdk/openai';
import {  } from 'ai';

const  = (('gpt-4o'));

const  = ({
  ,
  : longConversation,
});

for await (const  of .) {
  ..();
}

generateObject with compressed context

Works with structured output:

import {  } from 'headroom-ai/vercel-ai';
import {  } from '@ai-sdk/openai';
import { , Output } from 'ai';
import {  } from 'zod';

const  = (('gpt-4o'));

const {  } = await ({
  ,
  : Output.({
    : .({
      : .(),
      : .(['low', 'medium', 'high']),
    }),
  }),
  : largeConversationHistory,
});

How it works

  1. Messages are converted from Vercel format to OpenAI format
  2. Headroom compresses them via the proxy's /v1/compress endpoint
  3. Compressed messages are converted back to Vercel format
  4. The original model receives the smaller prompt

All other model behavior (tool calling, structured output, streaming) is unchanged.

On this page