API REFERENCE · CHAT COMPLETIONS

Chat completions

OpenAI-compatible endpoint for chat-style conversations. Use this for almost everything text-shaped: assistants, RAG, code generation, summarization, agents.

POSThttps://api.hoonify.dev/v1/chat/completions

Authentication

Send your API key as a bearer token. See the API keys page for creating and rotating keys.

shell

Authorization: Bearer hoon_sk_live_…

Request body

Field	Type	Description
model	string · required	Model identifier — e.g. `llama-3.3-70b`. See catalog.
messages	array · required	Conversation turns. Each item is `{role, content}`. Roles: `system`, `user`, `assistant`, `tool`.
temperature	number · 0–2	Sampling temperature. Default `0.7`. Lower = more deterministic.
max_tokens	integer	Hard cap on completion tokens. Defaults to model max if omitted.
top_p	number · 0–1	Nucleus sampling. Default `1`. Use this or temperature, not both.
top_k	integer	Hoonify extension. Sample from the top-k logits. Default `0` (off).
stream	boolean	If `true`, returns Server-Sent Events instead of one JSON body.
quantization	string	Hoonify extension. `fp16` / `fp8` / `int4`. Defaults to the cheapest variant the model exposes.
stop	string · array	One or more stop sequences. Up to 4 strings.
tools	array	Function-calling. Same shape as the OpenAI `tools` array.

Headers

Header	Value	Use
X-Hoonify-Pool	`na` · `eu` · `apac`	Pin a specific pool. Required for data-residency workloads.
X-Hoonify-Idempotency-Key	uuid	Dedupes retries within a 5-minute window. Recommended for non-streaming requests.

Example request

json

{
  "model": "llama-3.3-70b",
  "messages": [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user",   "content": "Summarize quantum tunneling in one paragraph."}
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "top_p": 0.95,
  "stream": false,
  "quantization": "fp8",
  "stop": ["</done>"]
}

Response

Non-streaming responses return a single JSON body. Streaming responses are an SSE stream of chat.completion.chunk objects terminated by [DONE].

json

{
  "id": "chatcmpl-FRcX2Fe1k4vR",
  "object": "chat.completion",
  "created": 1745784012,
  "model": "llama-3.3-70b",
  "pool": "na",
  "system_fingerprint": "hoonify-fp8-r12",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum tunneling is the phenomenon where a particle…"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 142,
    "total_tokens": 170
  }
}

Hoonify extensions on the response

Field	Description
pool	The pool that served this request — `na`, `eu`, or `apac`.
system_fingerprint	Hash identifying the model + quantization + runtime that produced the response. Stable while the operator setup is unchanged.

Errors

Errors return the OpenAI-style error envelope: {"error": {"type": ..., "message": ...}}.

Status	Type	Cause
400	invalid_request	Malformed JSON, unsupported field, or invalid value.
401	unauthorized	Missing, malformed, or revoked API key.
403	forbidden_scope	Key lacks the required scope (e.g. inference:write).
404	model_not_found	Model ID does not exist or is not in your pool.
409	no_capacity	No replica available in the requested pool.
429	rate_limited	Per-key RPM exceeded. Back off and retry with jitter.
503	pool_degraded	Pool is temporarily routing-degraded. Retry with X-Hoonify-Pool fallback.

Idempotency

For long completions, set X-Hoonify-Idempotency-Key on retries. Hoonify deduplicates within a 5-minute window — you won't pay twice for a request that succeeded server-side but failed mid-flight back to your client.

Tool calling

Pass tools with JSON-Schema function definitions. The model returns atool_calls array on the assistant message; you execute the call and post the result back as a tool-role message in the next turn. Same shape as the OpenAI tools API — drop-in compatible.

Rate limits

Defaults per API key: 1,200 RPM on Tier 2 (the default), 12,000 RPM on Tier 3. Token-per-minute caps follow the same scaling. Bumps are zero-touch — email support@hoonify.dev with your use case.