Headroom

OpenAI SDK

Auto-compress messages in the OpenAI Node.js SDK with a single withHeadroom() wrapper.

Headroom wraps the OpenAI Node.js SDK to automatically compress messages before every chat.completions.create() call. All other methods (embeddings, images, audio) pass through unchanged.

Installation

npm install headroom-ai openai

Proxy required

The TypeScript SDK sends messages to a local Headroom proxy for compression. Start the proxy before using the SDK:

pip install "headroom-ai[proxy]"
headroom proxy

Quick start

import {  } from 'headroom-ai/openai';
import  from 'openai';

const  = (new ());

// Messages are compressed automatically before sending
const  = await .chat.completions.create({
  : 'gpt-4o',
  : longConversation,
});

That's it. Every call to client.chat.completions.create() compresses the messages first. The response format is identical to the unwrapped client.

How it works

withHeadroom() returns a proxy around your OpenAI client that intercepts chat.completions.create():

  1. Extracts messages from the request params
  2. Sends them to the Headroom proxy's /v1/compress endpoint
  3. Replaces the original messages with the compressed result
  4. Forwards the request to OpenAI as normal

All other client methods are untouched:

import {  } from 'headroom-ai/openai';
import  from 'openai';

const  = (new ());

// These pass through unchanged
const  = await .embeddings.create({
  : 'text-embedding-3-small',
  : 'Hello world',
});

Options

Pass compression options as the second argument:

import {  } from 'headroom-ai/openai';
import  from 'openai';

const  = (new (), {
  : 'gpt-4o',
  : 'http://localhost:8787',
});

Streaming

Streaming works normally. Compression happens before the request is sent:

import {  } from 'headroom-ai/openai';
import  from 'openai';

const  = (new ());

const  = await .chat.completions.create({
  : 'gpt-4o',
  : longConversation,
  : true,
});

for await (const  of ) {
  ..(.choices[0]?.delta?.content ?? '');
}

Tool calling

Tool call messages and tool results are compressed like any other message content. Large tool outputs (JSON arrays, logs) see the biggest savings:

import {  } from 'headroom-ai/openai';
import  from 'openai';

const  = (new ());

const  = await .chat.completions.create({
  : 'gpt-4o',
  : [
    { : 'user', : 'Search for recent errors' },
    {
      : 'assistant',
      : null,
      : [{ : 'call_1', : 'function', : { : 'search', : '{"q":"errors"}' } }],
    },
    {
      : 'tool',
      : 'call_1',
      : hugeJsonResult, // Compressed automatically
    },
  ],
  : [{ : 'function', : { : 'search', : {} } }],
});

On this page