CONCEPTS · RESERVATION VS ON-DEMAND

Reservation vs on-demand

Two ways to buy GPU compute on Hoonify. Pick reservation when the workload is scheduled and capacity-sensitive, on-demand when it's ad-hoc and elasticity-sensitive. Inference (the chat completions API) is always usage-based and doesn't fit either model — this page is about /v1/instances.

Side by side

Dimension	On-demand	Reservation
Billing	Per second, while `ready`. 60-second minimum.	Up-front for the full `duration_hours`. Non-refundable after start.
Price	Higher (see rate card).	~30% lower at 24h+, another ~15% at 7d+.
Capacity	Best-effort. May fail with `409 no_capacity` in tight pools.	Guaranteed at booking time. Hoonify will not oversell a reservation.
Lifecycle	Live until you call `POST /terminate`.	Auto-terminates at `expires_at`. Extend before expiration to keep the same hardware.
Pre-emption	Never preempted by Hoonify.	Never preempted. Reserved capacity is taken off the on-demand pool.
SLA	99.5% per-instance availability.	99.9% per-instance availability, with credit-back on miss.

When to pick which

Reservation

Training runs with a known duration. The 30% price floor pays for itself on day one.
Inference on a self-hosted runtime where capacity must be there at peak.
Multi-day fine-tunes or evaluation sweeps.
Anything where re-queueing through capacity contention costs more than the price delta.

On-demand

Exploratory work — interactive notebooks, ad-hoc benchmarks, scratch debugging.
Bursty batch jobs where capacity contention is acceptable.
Tests of a SKU before committing to a reservation. Run an hour, decide.

Booking a reservation

Pass duration_hours on the create call. The minimum is 1 hour (no discount). The discounted tiers are 24h, 168h (7d), and 720h (30d). Anything in between is priced on the next-lower tier — booking 48h gets the 24h rate, not the 168h rate.

json

POST /v1/instances
{
  "sku": "h200-141",
  "gpus": 8,
  "duration_hours": 168,
  "image": "ghcr.io/your-org/training:cu128"
}

Extending and ending early

Extend a reservation before expires_at with POST /v1/instances/{id}/extend. Extension is at the current reservation's rate as long as the same SKU has capacity in the pool. Ending early is allowed — the unused portion is forfeited.

Capacity holds

For multi-week reservations on tight SKUs (Rubin-220, B200-180), Hoonify offers a capacity hold: a 48-hour right-of-first-refusal at the current rate, no charge if you don't convert. Email capacity@hoonify.dev with the SKU and pool.

Mixing the two

Most production setups run a reservation for the steady-state floor and bridge peaks with on-demand. The two share an org-level pool, so a key with compute:write can hit either without ceremony.

Related: Compute / instances · Rate card