CONCEPTS · RESERVATION VS ON-DEMAND
Reservation vs on-demand
Two ways to buy GPU compute on Hoonify. Pick reservation when the workload is scheduled and capacity-sensitive, on-demand when it's ad-hoc and elasticity-sensitive. Inference (the chat completions API) is always usage-based and doesn't fit either model — this page is about /v1/instances.
Side by side
| Dimension | On-demand | Reservation |
|---|---|---|
| Billing | Per second, while ready. 60-second minimum. | Up-front for the full duration_hours. Non-refundable after start. |
| Price | Higher (see rate card). | ~30% lower at 24h+, another ~15% at 7d+. |
| Capacity | Best-effort. May fail with 409 no_capacity in tight pools. | Guaranteed at booking time. Hoonify will not oversell a reservation. |
| Lifecycle | Live until you call POST /terminate. | Auto-terminates at expires_at. Extend before expiration to keep the same hardware. |
| Pre-emption | Never preempted by Hoonify. | Never preempted. Reserved capacity is taken off the on-demand pool. |
| SLA | 99.5% per-instance availability. | 99.9% per-instance availability, with credit-back on miss. |
When to pick which
Reservation
- Training runs with a known duration. The 30% price floor pays for itself on day one.
- Inference on a self-hosted runtime where capacity must be there at peak.
- Multi-day fine-tunes or evaluation sweeps.
- Anything where re-queueing through capacity contention costs more than the price delta.
On-demand
- Exploratory work — interactive notebooks, ad-hoc benchmarks, scratch debugging.
- Bursty batch jobs where capacity contention is acceptable.
- Tests of a SKU before committing to a reservation. Run an hour, decide.
Booking a reservation
Pass duration_hours on the create call. The minimum is 1 hour (no discount). The discounted tiers are 24h, 168h (7d), and 720h (30d). Anything in between is priced on the next-lower tier — booking 48h gets the 24h rate, not the 168h rate.
POST /v1/instances
{
"sku": "h200-141",
"gpus": 8,
"duration_hours": 168,
"image": "ghcr.io/your-org/training:cu128"
}Extending and ending early
Extend a reservation before expires_at with POST /v1/instances/{id}/extend. Extension is at the current reservation's rate as long as the same SKU has capacity in the pool. Ending early is allowed — the unused portion is forfeited.
Capacity holds
capacity@hoonify.dev with the SKU and pool.Mixing the two
Most production setups run a reservation for the steady-state floor and bridge peaks with on-demand. The two share an org-level pool, so a key with compute:write can hit either without ceremony.
Related: Compute / instances · Rate card