GET STARTED · QUICKSTART

Make your first request

This walkthrough takes about two minutes. By the end you'll have a working chat completion against an open-source model running on a Hoonify pool.

Prerequisites

An API key. If you don't have one, create one in the API keys page — keys are revealed once at creation, so copy it before you close the dialog.

1. Set your API key

Export the key as an environment variable so you can re-use it across examples:

shell

export HOONIFY_API_KEY="hoon_sk_live_…"

2. Send a request

The API is OpenAI-compatible — same request shape, same response shape. Pick your language:

shell

curl https://api.hoonify.dev/v1/chat/completions \
  -H "Authorization: Bearer $HOONIFY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "user", "content": "Explain LoRA fine-tuning in one paragraph."}
    ]
  }'

3. Stream the response

Setting stream: true returns chunks as the model generates them. Use this for chat UIs to show first-token latency instead of total latency.

python

stream = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Tell me a fun fact about octopi."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

4. Pin a pool

By default, Hoonify routes to the closest pool with capacity. Pin a specific pool — useful for data residency — by setting the X-Hoonify-Pool header to na, eu, or apac:

shell

curl https://api.hoonify.dev/v1/chat/completions \
  -H "Authorization: Bearer $HOONIFY_API_KEY" \
  -H "X-Hoonify-Pool: eu" \
  -d '{ "model": "llama-3.3-70b", "messages": [...] }'

5. Pick a different model

See the inference catalog for the full list. Common identifiers:

Model ID	Description
llama-3.3-70b	Strong general-purpose. 128K context. FP8 + FP16.
llama-3.1-8b	Lower-cost, faster. Good for high-volume routing.
qwen-2.5-72b	Multilingual, strong on code and math.
deepseek-v3	MoE, 671B params. Best quality at our top tier.
mixtral-8x22b	Mixture-of-experts. Strong throughput economics.

What's next

Full chat completions API reference — every request and response field documented.
Pools concept — how routing works and when to override it.
Workbench — try the same models without writing code first.

Next:Chat completions API