CONCEPTS · RESERVATION VS ON-DEMAND

Reservation vs on-demand

Two ways to buy GPU compute on Hoonify. Pick reservation when the workload is scheduled and capacity-sensitive, on-demand when it's ad-hoc and elasticity-sensitive. Inference (the chat completions API) is always usage-based and doesn't fit either model — this page is about /v1/instances.

Side by side

DimensionOn-demandReservation
BillingPer second, while ready. 60-second minimum.Up-front for the full duration_hours. Non-refundable after start.
PriceHigher (see rate card).~30% lower at 24h+, another ~15% at 7d+.
CapacityBest-effort. May fail with 409 no_capacity in tight pools.Guaranteed at booking time. Hoonify will not oversell a reservation.
LifecycleLive until you call POST /terminate.Auto-terminates at expires_at. Extend before expiration to keep the same hardware.
Pre-emptionNever preempted by Hoonify.Never preempted. Reserved capacity is taken off the on-demand pool.
SLA99.5% per-instance availability.99.9% per-instance availability, with credit-back on miss.

When to pick which

Reservation

  • Training runs with a known duration. The 30% price floor pays for itself on day one.
  • Inference on a self-hosted runtime where capacity must be there at peak.
  • Multi-day fine-tunes or evaluation sweeps.
  • Anything where re-queueing through capacity contention costs more than the price delta.

On-demand

  • Exploratory work — interactive notebooks, ad-hoc benchmarks, scratch debugging.
  • Bursty batch jobs where capacity contention is acceptable.
  • Tests of a SKU before committing to a reservation. Run an hour, decide.

Booking a reservation

Pass duration_hours on the create call. The minimum is 1 hour (no discount). The discounted tiers are 24h, 168h (7d), and 720h (30d). Anything in between is priced on the next-lower tier — booking 48h gets the 24h rate, not the 168h rate.

json
POST /v1/instances
{
  "sku": "h200-141",
  "gpus": 8,
  "duration_hours": 168,
  "image": "ghcr.io/your-org/training:cu128"
}

Extending and ending early

Extend a reservation before expires_at with POST /v1/instances/{id}/extend. Extension is at the current reservation's rate as long as the same SKU has capacity in the pool. Ending early is allowed — the unused portion is forfeited.

Capacity holds

For multi-week reservations on tight SKUs (Rubin-220, B200-180), Hoonify offers a capacity hold: a 48-hour right-of-first-refusal at the current rate, no charge if you don't convert. Email capacity@hoonify.dev with the SKU and pool.

Mixing the two

Most production setups run a reservation for the steady-state floor and bridge peaks with on-demand. The two share an org-level pool, so a key with compute:write can hit either without ceremony.

Related: Compute / instances · Rate card