SDKS · OPENAI COMPATIBILITY

OpenAI compatibility

Hoonify is a drop-in replacement for the OpenAI chat.completions, embeddings, and models endpoints. If you can call openai.chat.completions.create(), you can call Hoonify — change two lines and ship.

The two-line migration

diff
# Python
- client = OpenAI()
+ client = OpenAI(
+     base_url="https://api.hoonify.dev/v1",
+     api_key=os.environ["HOONIFY_API_KEY"],
+ )

# TypeScript
- const client = new OpenAI();
+ const client = new OpenAI({
+   baseURL: "https://api.hoonify.dev/v1",
+   apiKey: process.env.HOONIFY_API_KEY!,
+ });

What's compatible

SurfaceNotes
POST /v1/chat/completionsFull request and response shape, including streaming, tools, structured outputs, stop sequences.
POST /v1/embeddingsFull request and response shape. Supports float and base64 encoding.
GET /v1/modelsSame envelope; Hoonify adds family / pools / quantizations / context_window fields.
GET /v1/models/{id}Same envelope. Adds deprecation field.
Server-Sent Events streamingIdentical chunk envelope. SDKs work without changes.
Tool / function callingSame JSON-Schema tools array. Same tool_calls response shape.

Hoonify-only extensions

These don't exist on OpenAI — they're additive and safe to ignore on clients that don't set them. They cover routing, idempotency, and quantization.

Field / headerDescription
X-Hoonify-PoolHeader. Pin a pool. na · eu · apac.
X-Hoonify-Pool-FallbackHeader. strict (default) or nearest. Whether to fall back when the pinned pool is dry.
X-Hoonify-Idempotency-KeyHeader. UUID. Dedupes retries within 5 minutes.
quantizationBody field. fp16 · fp8 · int4. Defaults to the cheapest variant the model exposes.
top_kBody field. Hoonify extension on top of OpenAI sampling controls.
pool (response)Response field. The pool that served the request.
system_fingerprintResponse field. Stable across model + quantization + runtime.

What's not supported (yet)

SurfaceNotes
Vision / image inputsRoadmap. Today the API rejects content blocks with type image_url.
Audio (STT / TTS)Not on the platform yet. Use a dedicated provider.
Realtime APINot supported. Streaming over SSE is the live-token mechanism.
Assistants / threadsHoonify is stateless. Bring your own state / vector store.
Batch APIRoadmap. For now, parallelize on chat completions with normal rate limits.
File upload / fine-tuningHoonify hosts open-weight models — fine-tune on a compute instance and upload the artifact yourself.

Model IDs

OpenAI model IDs (gpt-4o, o3-mini, etc.) are not aliased on Hoonify — those are proprietary models we don't serve. Switch to one of the open-weight families: llama-3.3-70b, qwen-2.5-72b, deepseek-v3, mixtral-8x22b. See the full list in Models.

Pricing implications

Per-token pricing is set by the open-weight model, not the API call shape. See the rate card for current numbers.

Related: Python SDK · TypeScript SDK