OTHER · CHANGELOG
Changelog
Notable platform and API changes. Webhook subscribers also receive billing.rate_change and model.deprecation_announced events for anything that affects pricing or availability.
- 2026-04-22added
Qwen 3 72B is GA in NA and EU
Strong on multilingual code and tool calling. Available at fp16 and fp8. Live in NA and EU pools today; APAC follows in May.
- 2026-04-15added
X-Hoonify-Pool-Fallback header
New optional header. Set to nearest to silently fall back when a pinned pool is dry; default stays strict (returns 409 no_capacity) for predictable latency. Recommended for customer-facing chat with global audiences.
- 2026-04-08changed
Default quantization for Llama-3.3-70B is now fp8
Calibrated evals show < 0.5% delta vs fp16 on MMLU-redux, GSM8K, and HumanEval-Plus. Throughput is up ~1.7×, per-token price unchanged. Pin fp16 explicitly if you need parity with published benchmarks.
- 2026-03-30added
POST /v1/instances supports duration_hours up to 720
30-day reservations now bookable through the API. The 720h price tier is ~45% under on-demand. Capacity holds available on Rubin-220 and B200-180 — email capacity@hoonify.dev.
- 2026-03-22changed
Webhook retries follow exponential backoff
Retry schedule moved from a flat 5-minute interval to: 0s, 30s, 2m, 10m, 1h, 3h, 6h, 12h, 24h. Total window is unchanged at ~24h. The Hoonify-Delivery header now increments per attempt for log correlation.
- 2026-03-15added
Embeddings: jina-embeddings-v3 added
Strong on retrieval benchmarks at 1024 dim and 8K context. $0.030 per 1M input tokens.
- 2026-03-04added
MI300X on-demand in NA
AMD MI300X is available on-demand at $2.55/hr per GPU. Reservations open in two weeks.
- 2026-02-26fixed
Stale Retry-After when behind some proxies
Fixed a regression where Retry-After was rounded to the nearest minute on some egress paths, causing clients to back off harder than necessary. SDKs now see fractional second values again.
- 2026-02-12deprecated
Llama-3.1-405B is sunset
Hoonify is removing Llama-3.1-405B on 2026-05-15. Llama-3.3-70B at fp16 is the recommended replacement and is consistently within 1.5% on capability evals at a fraction of the throughput cost. The API returns a deprecation field; webhook event model.deprecation_announced has fired.
- 2026-02-04security
API key prefixes
New key format: hoon_sk_live_, hoon_sk_test_, hoon_pk_live_. Existing keys stay valid until 2026-08-01. Rotate at your own pace via the API keys page.
- 2026-01-20added
system_fingerprint in response body
Every chat completion now returns a system_fingerprint. Stable while the operator setup, model, and quantization are unchanged — useful for caching and reproducibility.