Run open-source inference on real GPUs, on real terms.

Llama, Qwen, DeepSeek, Mistral — pooled across every NeoCloud running Hoonify, with an OpenAI-compatible API. Try a model in the workbench before you write a line of code.

Try a model now Create account · $5 free Why open source?

Real capacity, real prices, real tokens.

GPUs in pool

47,275

Pools

H100/GPU/hr

$1.39

Aggregate throughput

1.1M tok/s

POOLS

Three pools. One control plane.

You pick the pool. Hoonify picks the rack.

Browse pools

POOL · NA

~<35ms typical

North America

GPUs in pool: 20,856
GPU SKUs: 9
Models: 9
Avg uptime: 97.71%
Network: <35ms
Routing: Hoonify

POOL · EU

~<55ms typical

Europe

GPUs in pool: 15,680
GPU SKUs: 7
Models: 5
Avg uptime: 98.06%
Network: <55ms
Routing: Hoonify

POOL · APAC

~<85ms typical

APAC

GPUs in pool: 11,712
GPU SKUs: 8
Models: 4
Avg uptime: 97.63%
Network: <85ms
Routing: Hoonify

INFERENCE

Tokens, billed per million.

Llama, Qwen, DeepSeek, Mistral — quantized variants pooled across pools. OpenAI-compatible API. Try one in the workbench before you write a line of code.

Open the workbench

BARE-METAL GPUs

Pick the rack, not the wrapper.

H100, H200, MI300X, B200, L40S, 4090. Hourly, weekly, or reserved across every operator on the network.

Browse compute

RESERVED CAPACITY

Lock the year, not the day.

Multi-month contracts on B200 and H200 capacity. Fixed pricing, dedicated racks, single contract across operators.

Talk to capacity

FOR OPERATORS

Have the racks. Need the renters.

Hoonify is the full software stack for NeoCloud operators — TurbOS bare-metal provisioning, Kubernetes, inference scheduler, billing, and the Howl customer portal. The aggregator is the demand side. List your capacity once and it's in front of every customer in the network.

List your capacity See operator stack

Provisioning
TurbOS images bare metal in <90s.
Scheduling
Inference scheduler & GPU sharing built-in.
Billing
Per-second metered. Credits, invoices, tax.