CONCEPTS · RATE CARD
Rate card
Hoonify charges a flat per-token rate for inference and a per-second rate for compute. Prices are uniform across pools and operators — nobody bids, nobody surge-prices. The numbers below are illustrative; the live values are surfaced in the /rates page and via GET /v1/rate-cards.
Inference (per 1M tokens)
| Model | Input | Output | Context | Quantizations |
|---|---|---|---|---|
| llama-3.3-70b | $0.42 | $0.96 | 128K | fp8 / fp16 |
| llama-3.1-8b | $0.06 | $0.18 | 128K | fp8 / fp16 |
| qwen-2.5-72b | $0.40 | $0.92 | 128K | fp8 / fp16 |
| deepseek-v3 | $0.55 | $1.40 | 64K | fp8 |
| mixtral-8x22b | $0.34 | $0.84 | 64K | fp8 / fp16 |
| qwen-3-72b | $0.45 | $1.05 | 128K | fp8 / fp16 |
Embeddings (per 1M tokens)
| Model | Price |
|---|---|
| bge-large-en-v1.5 | $0.025 |
| bge-m3 | $0.030 |
| nomic-embed-text-v1.5 | $0.018 |
| jina-embeddings-v3 | $0.030 |
GPU compute (per GPU)
| SKU | On-demand | Reserved (24h+) |
|---|---|---|
| h100-80 | $2.85/hr | $1.95/hr |
| h100-94 | $3.10/hr | $2.15/hr |
| h200-141 | $3.24/hr | $2.40/hr |
| b200-180 | $5.10/hr | $3.85/hr |
| rubin-220 | — | $8.20/hr |
| mi300x | $2.55/hr | $1.80/hr |
| l40s-48 | $1.10/hr | $0.78/hr |
Multi-GPU instances multiply the per-GPU rate. An 8× H200-141 reservation is 8 × $2.40/hr = $19.20/hr. See Reservation vs on-demand for when each makes sense.
Billing units
- Inference: per token, billed at request close. Streamed responses bill as they emit.
- Embeddings: per input token. Output dimension does not affect price.
- Compute: per second, with a 60-second minimum. Billing starts at
ready. - Webhook deliveries / API control plane / dashboard: not billed.
Discounts
sales@hoonify.dev for committed-usage agreements above $1M/yr.What changes prices?
New SKUs and models show up at launch prices and stay flat for a quarter; existing prices have a 30-day forward notice published as a changelog entry and a billing.rate_change webhook event. Rates have only ever moved in one direction so far.
Related: Reservation vs on-demand · /rates