SDKS · PYTHON
Python SDK
Use the official openai Python package — Hoonify is a drop-in replacement. Point base_url at https://api.hoonify.dev/v1and pass your Hoonify API key.
No separate package
We deliberately don't ship a
hoonify Python SDK. The OpenAI SDK already covers chat, embeddings, and the streaming protocol. Hoonify-only extensions are passed via extra_headers / extra_body.Install
shell
pip install openai # 1.x or later
# or, with uv:
uv add openaiSync client
python
# pip install openai
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.hoonify.dev/v1",
api_key=os.environ["HOONIFY_API_KEY"],
)Async client
python
# pip install openai
import os, asyncio
from openai import AsyncOpenAI
client = AsyncOpenAI(
base_url="https://api.hoonify.dev/v1",
api_key=os.environ["HOONIFY_API_KEY"],
)
async def main():
resp = await client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)
asyncio.run(main())Streaming
The 1.x context-managed streaming API works as-is. Hoonify emits the same chat.completion.chunk envelope as the OpenAI API.
python
from openai import OpenAI
client = OpenAI(base_url="https://api.hoonify.dev/v1")
with client.chat.completions.stream(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Tell me a fun fact about octopi."}],
) as stream:
for event in stream:
if event.type == "content.delta":
print(event.delta, end="", flush=True)Timeouts and retries
python
from openai import OpenAI
from httpx import Timeout
client = OpenAI(
base_url="https://api.hoonify.dev/v1",
timeout=Timeout(60.0, connect=5.0),
max_retries=4, # default 2; openai SDK respects Retry-After
default_headers={
"X-Hoonify-Pool": "eu",
},
)The SDK honors Retry-After on 429 automatically. Bump max_retries for long-running batch jobs that can absorb the latency.
Hoonify extensions
Pin a pool per-request
python
resp = client.chat.completions.create(
model="llama-3.3-70b",
messages=[...],
extra_headers={"X-Hoonify-Pool": "eu"},
)Pass a quantization or top_k
python
resp = client.chat.completions.create(
model="llama-3.3-70b",
messages=[...],
extra_body={
"quantization": "fp8",
"top_k": 40,
},
)See the chat completions reference for every supported field.
Catching Hoonify-specific errors
python
from openai import APIStatusError, RateLimitError
try:
resp = client.chat.completions.create(...)
except RateLimitError as e:
# 429
...
except APIStatusError as e:
if e.status_code == 409 and e.body.get("error", {}).get("type") == "no_capacity":
# retry with X-Hoonify-Pool fallback
...
raiseWhat about compute / instances?
Instance lifecycle is not part of the OpenAI shape. For now, call it directly:
python
import os, httpx
base = "https://api.hoonify.dev/v1"
auth = {"Authorization": f"Bearer {os.environ['HOONIFY_API_KEY']}"}
async with httpx.AsyncClient(base_url=base, headers=auth) as h:
inst = await h.post("/instances", json={
"sku": "h200-141", "gpus": 8, "duration_hours": 4,
})
inst.raise_for_status()Related: TypeScript SDK · OpenAI compatibility