SDKS · OPENAI COMPATIBILITY
OpenAI compatibility
Hoonify is a drop-in replacement for the OpenAI chat.completions, embeddings, and models endpoints. If you can call openai.chat.completions.create(), you can call Hoonify — change two lines and ship.
The two-line migration
diff
# Python
- client = OpenAI()
+ client = OpenAI(
+ base_url="https://api.hoonify.dev/v1",
+ api_key=os.environ["HOONIFY_API_KEY"],
+ )
# TypeScript
- const client = new OpenAI();
+ const client = new OpenAI({
+ baseURL: "https://api.hoonify.dev/v1",
+ apiKey: process.env.HOONIFY_API_KEY!,
+ });What's compatible
| Surface | Notes |
|---|---|
| POST /v1/chat/completions | Full request and response shape, including streaming, tools, structured outputs, stop sequences. |
| POST /v1/embeddings | Full request and response shape. Supports float and base64 encoding. |
| GET /v1/models | Same envelope; Hoonify adds family / pools / quantizations / context_window fields. |
| GET /v1/models/{id} | Same envelope. Adds deprecation field. |
| Server-Sent Events streaming | Identical chunk envelope. SDKs work without changes. |
| Tool / function calling | Same JSON-Schema tools array. Same tool_calls response shape. |
Hoonify-only extensions
These don't exist on OpenAI — they're additive and safe to ignore on clients that don't set them. They cover routing, idempotency, and quantization.
| Field / header | Description |
|---|---|
| X-Hoonify-Pool | Header. Pin a pool. na · eu · apac. |
| X-Hoonify-Pool-Fallback | Header. strict (default) or nearest. Whether to fall back when the pinned pool is dry. |
| X-Hoonify-Idempotency-Key | Header. UUID. Dedupes retries within 5 minutes. |
| quantization | Body field. fp16 · fp8 · int4. Defaults to the cheapest variant the model exposes. |
| top_k | Body field. Hoonify extension on top of OpenAI sampling controls. |
| pool (response) | Response field. The pool that served the request. |
| system_fingerprint | Response field. Stable across model + quantization + runtime. |
What's not supported (yet)
| Surface | Notes |
|---|---|
| Vision / image inputs | Roadmap. Today the API rejects content blocks with type image_url. |
| Audio (STT / TTS) | Not on the platform yet. Use a dedicated provider. |
| Realtime API | Not supported. Streaming over SSE is the live-token mechanism. |
| Assistants / threads | Hoonify is stateless. Bring your own state / vector store. |
| Batch API | Roadmap. For now, parallelize on chat completions with normal rate limits. |
| File upload / fine-tuning | Hoonify hosts open-weight models — fine-tune on a compute instance and upload the artifact yourself. |
Model IDs
OpenAI model IDs (
gpt-4o, o3-mini, etc.) are not aliased on Hoonify — those are proprietary models we don't serve. Switch to one of the open-weight families: llama-3.3-70b, qwen-2.5-72b, deepseek-v3, mixtral-8x22b. See the full list in Models.Pricing implications
Per-token pricing is set by the open-weight model, not the API call shape. See the rate card for current numbers.
Related: Python SDK · TypeScript SDK