CONCEPTS · RATE CARD

Rate card

Hoonify charges a flat per-token rate for inference and a per-second rate for compute. Prices are uniform across pools and operators — nobody bids, nobody surge-prices. The numbers below are illustrative; the live values are surfaced in the /rates page and via GET /v1/rate-cards.

Inference (per 1M tokens)

Model	Input	Output	Context	Quantizations
llama-3.3-70b	$0.42	$0.96	128K	fp8 / fp16
llama-3.1-8b	$0.06	$0.18	128K	fp8 / fp16
qwen-2.5-72b	$0.40	$0.92	128K	fp8 / fp16
deepseek-v3	$0.55	$1.40	64K	fp8
mixtral-8x22b	$0.34	$0.84	64K	fp8 / fp16
qwen-3-72b	$0.45	$1.05	128K	fp8 / fp16

Embeddings (per 1M tokens)

Model	Price
bge-large-en-v1.5	$0.025
bge-m3	$0.030
nomic-embed-text-v1.5	$0.018
jina-embeddings-v3	$0.030

GPU compute (per GPU)

SKU	On-demand	Reserved (24h+)
h100-80	$2.85/hr	$1.95/hr
h100-94	$3.10/hr	$2.15/hr
h200-141	$3.24/hr	$2.40/hr
b200-180	$5.10/hr	$3.85/hr
rubin-220	—	$8.20/hr
mi300x	$2.55/hr	$1.80/hr
l40s-48	$1.10/hr	$0.78/hr

Multi-GPU instances multiply the per-GPU rate. An 8× H200-141 reservation is 8 × $2.40/hr = $19.20/hr. See Reservation vs on-demand for when each makes sense.

Billing units

Inference: per token, billed at request close. Streamed responses bill as they emit.
Embeddings: per input token. Output dimension does not affect price.
Compute: per second, with a 60-second minimum. Billing starts at ready.
Webhook deliveries / API control plane / dashboard: not billed.

Discounts

Reservations longer than 7 days cut another ~15% off the reserved column. Volume rebates kick in at $25K/mo (3%), $100K/mo (7%), and $500K/mo (12%) — applied as a credit on the next invoice. Contact sales@hoonify.dev for committed-usage agreements above $1M/yr.

What changes prices?

New SKUs and models show up at launch prices and stay flat for a quarter; existing prices have a 30-day forward notice published as a changelog entry and a billing.rate_change webhook event. Rates have only ever moved in one direction so far.

Related: Reservation vs on-demand · /rates