SDKS · OPENAI COMPATIBILITY

OpenAI compatibility

Hoonify is a drop-in replacement for the OpenAI chat.completions, embeddings, and models endpoints. If you can call openai.chat.completions.create(), you can call Hoonify — change two lines and ship.

The two-line migration

diff

# Python
- client = OpenAI()
+ client = OpenAI(
+     base_url="https://api.hoonify.dev/v1",
+     api_key=os.environ["HOONIFY_API_KEY"],
+ )

# TypeScript
- const client = new OpenAI();
+ const client = new OpenAI({
+   baseURL: "https://api.hoonify.dev/v1",
+   apiKey: process.env.HOONIFY_API_KEY!,
+ });

What's compatible

Surface	Notes
POST /v1/chat/completions	Full request and response shape, including streaming, tools, structured outputs, stop sequences.
POST /v1/embeddings	Full request and response shape. Supports float and base64 encoding.
GET /v1/models	Same envelope; Hoonify adds family / pools / quantizations / context_window fields.
GET /v1/models/{id}	Same envelope. Adds deprecation field.
Server-Sent Events streaming	Identical chunk envelope. SDKs work without changes.
Tool / function calling	Same JSON-Schema tools array. Same tool_calls response shape.

Hoonify-only extensions

These don't exist on OpenAI — they're additive and safe to ignore on clients that don't set them. They cover routing, idempotency, and quantization.

Field / header	Description
X-Hoonify-Pool	Header. Pin a pool. na · eu · apac.
X-Hoonify-Pool-Fallback	Header. strict (default) or nearest. Whether to fall back when the pinned pool is dry.
X-Hoonify-Idempotency-Key	Header. UUID. Dedupes retries within 5 minutes.
quantization	Body field. fp16 · fp8 · int4. Defaults to the cheapest variant the model exposes.
top_k	Body field. Hoonify extension on top of OpenAI sampling controls.
pool (response)	Response field. The pool that served the request.
system_fingerprint	Response field. Stable across model + quantization + runtime.

What's not supported (yet)

Surface	Notes
Vision / image inputs	Roadmap. Today the API rejects content blocks with type image_url.
Audio (STT / TTS)	Not on the platform yet. Use a dedicated provider.
Realtime API	Not supported. Streaming over SSE is the live-token mechanism.
Assistants / threads	Hoonify is stateless. Bring your own state / vector store.
Batch API	Roadmap. For now, parallelize on chat completions with normal rate limits.
File upload / fine-tuning	Hoonify hosts open-weight models — fine-tune on a compute instance and upload the artifact yourself.

Model IDs

OpenAI model IDs (gpt-4o, o3-mini, etc.) are not aliased on Hoonify — those are proprietary models we don't serve. Switch to one of the open-weight families: llama-3.3-70b, qwen-2.5-72b, deepseek-v3, mixtral-8x22b. See the full list in Models.

Pricing implications

Per-token pricing is set by the open-weight model, not the API call shape. See the rate card for current numbers.

Related: Python SDK · TypeScript SDK