GET STARTED · QUICKSTART
Make your first request
This walkthrough takes about two minutes. By the end you'll have a working chat completion against an open-source model running on a Hoonify pool.
Prerequisites
1. Set your API key
Export the key as an environment variable so you can re-use it across examples:
export HOONIFY_API_KEY="hoon_sk_live_…"2. Send a request
The API is OpenAI-compatible — same request shape, same response shape. Pick your language:
curl https://api.hoonify.dev/v1/chat/completions \
-H "Authorization: Bearer $HOONIFY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b",
"messages": [
{"role": "user", "content": "Explain LoRA fine-tuning in one paragraph."}
]
}'3. Stream the response
Setting stream: true returns chunks as the model generates them. Use this for chat UIs to show first-token latency instead of total latency.
stream = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Tell me a fun fact about octopi."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)4. Pin a pool
By default, Hoonify routes to the closest pool with capacity. Pin a specific pool — useful for data residency — by setting the X-Hoonify-Pool header to na, eu, or apac:
curl https://api.hoonify.dev/v1/chat/completions \
-H "Authorization: Bearer $HOONIFY_API_KEY" \
-H "X-Hoonify-Pool: eu" \
-d '{ "model": "llama-3.3-70b", "messages": [...] }'5. Pick a different model
See the inference catalog for the full list. Common identifiers:
| Model ID | Description |
|---|---|
| llama-3.3-70b | Strong general-purpose. 128K context. FP8 + FP16. |
| llama-3.1-8b | Lower-cost, faster. Good for high-volume routing. |
| qwen-2.5-72b | Multilingual, strong on code and math. |
| deepseek-v3 | MoE, 671B params. Best quality at our top tier. |
| mixtral-8x22b | Mixture-of-experts. Strong throughput economics. |
What's next
- Full chat completions API reference — every request and response field documented.
- Pools concept — how routing works and when to override it.
- Workbench — try the same models without writing code first.
Next:Chat completions API