OTHER · CHANGELOG

Changelog

Notable platform and API changes. Webhook subscribers also receive billing.rate_change and model.deprecation_announced events for anything that affects pricing or availability.

  1. 2026-04-22added

    Qwen 3 72B is GA in NA and EU

    Strong on multilingual code and tool calling. Available at fp16 and fp8. Live in NA and EU pools today; APAC follows in May.

  2. 2026-04-15added

    X-Hoonify-Pool-Fallback header

    New optional header. Set to nearest to silently fall back when a pinned pool is dry; default stays strict (returns 409 no_capacity) for predictable latency. Recommended for customer-facing chat with global audiences.

  3. 2026-04-08changed

    Default quantization for Llama-3.3-70B is now fp8

    Calibrated evals show < 0.5% delta vs fp16 on MMLU-redux, GSM8K, and HumanEval-Plus. Throughput is up ~1.7×, per-token price unchanged. Pin fp16 explicitly if you need parity with published benchmarks.

  4. 2026-03-30added

    POST /v1/instances supports duration_hours up to 720

    30-day reservations now bookable through the API. The 720h price tier is ~45% under on-demand. Capacity holds available on Rubin-220 and B200-180 — email capacity@hoonify.dev.

  5. 2026-03-22changed

    Webhook retries follow exponential backoff

    Retry schedule moved from a flat 5-minute interval to: 0s, 30s, 2m, 10m, 1h, 3h, 6h, 12h, 24h. Total window is unchanged at ~24h. The Hoonify-Delivery header now increments per attempt for log correlation.

  6. 2026-03-15added

    Embeddings: jina-embeddings-v3 added

    Strong on retrieval benchmarks at 1024 dim and 8K context. $0.030 per 1M input tokens.

  7. 2026-03-04added

    MI300X on-demand in NA

    AMD MI300X is available on-demand at $2.55/hr per GPU. Reservations open in two weeks.

  8. 2026-02-26fixed

    Stale Retry-After when behind some proxies

    Fixed a regression where Retry-After was rounded to the nearest minute on some egress paths, causing clients to back off harder than necessary. SDKs now see fractional second values again.

  9. 2026-02-12deprecated

    Llama-3.1-405B is sunset

    Hoonify is removing Llama-3.1-405B on 2026-05-15. Llama-3.3-70B at fp16 is the recommended replacement and is consistently within 1.5% on capability evals at a fraction of the throughput cost. The API returns a deprecation field; webhook event model.deprecation_announced has fired.

  10. 2026-02-04security

    API key prefixes

    New key format: hoon_sk_live_, hoon_sk_test_, hoon_pk_live_. Existing keys stay valid until 2026-08-01. Rotate at your own pace via the API keys page.

  11. 2026-01-20added

    system_fingerprint in response body

    Every chat completion now returns a system_fingerprint. Stable while the operator setup, model, and quantization are unchanged — useful for caching and reproducibility.

Related: Status · Webhooks