CONCEPTS · RATE CARD

Rate card

Hoonify charges a flat per-token rate for inference and a per-second rate for compute. Prices are uniform across pools and operators — nobody bids, nobody surge-prices. The numbers below are illustrative; the live values are surfaced in the /rates page and via GET /v1/rate-cards.

Inference (per 1M tokens)

ModelInputOutputContextQuantizations
llama-3.3-70b$0.42$0.96128Kfp8 / fp16
llama-3.1-8b$0.06$0.18128Kfp8 / fp16
qwen-2.5-72b$0.40$0.92128Kfp8 / fp16
deepseek-v3$0.55$1.4064Kfp8
mixtral-8x22b$0.34$0.8464Kfp8 / fp16
qwen-3-72b$0.45$1.05128Kfp8 / fp16

Embeddings (per 1M tokens)

ModelPrice
bge-large-en-v1.5$0.025
bge-m3$0.030
nomic-embed-text-v1.5$0.018
jina-embeddings-v3$0.030

GPU compute (per GPU)

SKUOn-demandReserved (24h+)
h100-80$2.85/hr$1.95/hr
h100-94$3.10/hr$2.15/hr
h200-141$3.24/hr$2.40/hr
b200-180$5.10/hr$3.85/hr
rubin-220$8.20/hr
mi300x$2.55/hr$1.80/hr
l40s-48$1.10/hr$0.78/hr

Multi-GPU instances multiply the per-GPU rate. An 8× H200-141 reservation is 8 × $2.40/hr = $19.20/hr. See Reservation vs on-demand for when each makes sense.

Billing units

  • Inference: per token, billed at request close. Streamed responses bill as they emit.
  • Embeddings: per input token. Output dimension does not affect price.
  • Compute: per second, with a 60-second minimum. Billing starts at ready.
  • Webhook deliveries / API control plane / dashboard: not billed.

Discounts

Reservations longer than 7 days cut another ~15% off the reserved column. Volume rebates kick in at $25K/mo (3%), $100K/mo (7%), and $500K/mo (12%) — applied as a credit on the next invoice. Contact sales@hoonify.dev for committed-usage agreements above $1M/yr.

What changes prices?

New SKUs and models show up at launch prices and stay flat for a quarter; existing prices have a 30-day forward notice published as a changelog entry and a billing.rate_change webhook event. Rates have only ever moved in one direction so far.

Related: Reservation vs on-demand · /rates