API REFERENCE · CHAT COMPLETIONS
Chat completions
OpenAI-compatible endpoint for chat-style conversations. Use this for almost everything text-shaped: assistants, RAG, code generation, summarization, agents.
https://api.hoonify.dev/v1/chat/completionsAuthentication
Send your API key as a bearer token. See the API keys page for creating and rotating keys.
Authorization: Bearer hoon_sk_live_…Request body
| Field | Type | Description |
|---|---|---|
| model | string · required | Model identifier — e.g. llama-3.3-70b. See catalog. |
| messages | array · required | Conversation turns. Each item is {role, content}. Roles: system, user, assistant, tool. |
| temperature | number · 0–2 | Sampling temperature. Default 0.7. Lower = more deterministic. |
| max_tokens | integer | Hard cap on completion tokens. Defaults to model max if omitted. |
| top_p | number · 0–1 | Nucleus sampling. Default 1. Use this or temperature, not both. |
| top_k | integer | Hoonify extension. Sample from the top-k logits. Default 0 (off). |
| stream | boolean | If true, returns Server-Sent Events instead of one JSON body. |
| quantization | string | Hoonify extension. fp16 / fp8 / int4. Defaults to the cheapest variant the model exposes. |
| stop | string · array | One or more stop sequences. Up to 4 strings. |
| tools | array | Function-calling. Same shape as the OpenAI tools array. |
Headers
| Header | Value | Use |
|---|---|---|
| X-Hoonify-Pool | na · eu · apac | Pin a specific pool. Required for data-residency workloads. |
| X-Hoonify-Idempotency-Key | uuid | Dedupes retries within a 5-minute window. Recommended for non-streaming requests. |
Example request
{
"model": "llama-3.3-70b",
"messages": [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Summarize quantum tunneling in one paragraph."}
],
"temperature": 0.7,
"max_tokens": 1024,
"top_p": 0.95,
"stream": false,
"quantization": "fp8",
"stop": ["</done>"]
}Response
Non-streaming responses return a single JSON body. Streaming responses are an SSE stream of chat.completion.chunk objects terminated by [DONE].
{
"id": "chatcmpl-FRcX2Fe1k4vR",
"object": "chat.completion",
"created": 1745784012,
"model": "llama-3.3-70b",
"pool": "na",
"system_fingerprint": "hoonify-fp8-r12",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum tunneling is the phenomenon where a particle…"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 142,
"total_tokens": 170
}
}Hoonify extensions on the response
| Field | Description |
|---|---|
| pool | The pool that served this request — na, eu, or apac. |
| system_fingerprint | Hash identifying the model + quantization + runtime that produced the response. Stable while the operator setup is unchanged. |
Errors
Errors return the OpenAI-style error envelope: {"error": {"type": ..., "message": ...}}.
| Status | Type | Cause |
|---|---|---|
| 400 | invalid_request | Malformed JSON, unsupported field, or invalid value. |
| 401 | unauthorized | Missing, malformed, or revoked API key. |
| 403 | forbidden_scope | Key lacks the required scope (e.g. inference:write). |
| 404 | model_not_found | Model ID does not exist or is not in your pool. |
| 409 | no_capacity | No replica available in the requested pool. |
| 429 | rate_limited | Per-key RPM exceeded. Back off and retry with jitter. |
| 503 | pool_degraded | Pool is temporarily routing-degraded. Retry with X-Hoonify-Pool fallback. |
Idempotency
X-Hoonify-Idempotency-Key on retries. Hoonify deduplicates within a 5-minute window — you won't pay twice for a request that succeeded server-side but failed mid-flight back to your client.Tool calling
Pass tools with JSON-Schema function definitions. The model returns atool_calls array on the assistant message; you execute the call and post the result back as a tool-role message in the next turn. Same shape as the OpenAI tools API — drop-in compatible.
Rate limits
Defaults per API key: 1,200 RPM on Tier 2 (the default), 12,000 RPM on Tier 3. Token-per-minute caps follow the same scaling. Bumps are zero-touch — email support@hoonify.dev with your use case.