SDKS · PYTHON

Python SDK

Use the official openai Python package — Hoonify is a drop-in replacement. Point base_url at https://api.hoonify.dev/v1and pass your Hoonify API key.

No separate package

We deliberately don't ship a hoonify Python SDK. The OpenAI SDK already covers chat, embeddings, and the streaming protocol. Hoonify-only extensions are passed via extra_headers / extra_body.

Install

shell
pip install openai           # 1.x or later
# or, with uv:
uv add openai

Sync client

python
# pip install openai
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.hoonify.dev/v1",
    api_key=os.environ["HOONIFY_API_KEY"],
)

Async client

python
# pip install openai
import os, asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://api.hoonify.dev/v1",
    api_key=os.environ["HOONIFY_API_KEY"],
)

async def main():
    resp = await client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[{"role": "user", "content": "Hello"}],
    )
    print(resp.choices[0].message.content)

asyncio.run(main())

Streaming

The 1.x context-managed streaming API works as-is. Hoonify emits the same chat.completion.chunk envelope as the OpenAI API.

python
from openai import OpenAI

client = OpenAI(base_url="https://api.hoonify.dev/v1")

with client.chat.completions.stream(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Tell me a fun fact about octopi."}],
) as stream:
    for event in stream:
        if event.type == "content.delta":
            print(event.delta, end="", flush=True)

Timeouts and retries

python
from openai import OpenAI
from httpx import Timeout

client = OpenAI(
    base_url="https://api.hoonify.dev/v1",
    timeout=Timeout(60.0, connect=5.0),
    max_retries=4,        # default 2; openai SDK respects Retry-After
    default_headers={
        "X-Hoonify-Pool": "eu",
    },
)

The SDK honors Retry-After on 429 automatically. Bump max_retries for long-running batch jobs that can absorb the latency.

Hoonify extensions

Pin a pool per-request

python
resp = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[...],
    extra_headers={"X-Hoonify-Pool": "eu"},
)

Pass a quantization or top_k

python
resp = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[...],
    extra_body={
        "quantization": "fp8",
        "top_k": 40,
    },
)

See the chat completions reference for every supported field.

Catching Hoonify-specific errors

python
from openai import APIStatusError, RateLimitError

try:
    resp = client.chat.completions.create(...)
except RateLimitError as e:
    # 429
    ...
except APIStatusError as e:
    if e.status_code == 409 and e.body.get("error", {}).get("type") == "no_capacity":
        # retry with X-Hoonify-Pool fallback
        ...
    raise

What about compute / instances?

Instance lifecycle is not part of the OpenAI shape. For now, call it directly:

python
import os, httpx

base = "https://api.hoonify.dev/v1"
auth = {"Authorization": f"Bearer {os.environ['HOONIFY_API_KEY']}"}

async with httpx.AsyncClient(base_url=base, headers=auth) as h:
    inst = await h.post("/instances", json={
        "sku": "h200-141", "gpus": 8, "duration_hours": 4,
    })
    inst.raise_for_status()

Related: TypeScript SDK · OpenAI compatibility