AI INFERENCE PLATFORM // OPEN-SOURCE FIRST
Run open-source inference on real GPUs, on real terms.
Llama, Qwen, DeepSeek, Mistral — pooled across every NeoCloud running Hoonify, with an OpenAI-compatible API. Try a model in the workbench before you write a line of code.
Real capacity, real prices, real tokens.
GPUs in pool
Pools
H100/GPU/hr
Aggregate throughput
POOLS
Three pools. One control plane.
You pick the pool. Hoonify picks the rack.
POOL · NA
~<35ms typicalNorth America
- GPUs in pool
- 20,856
- GPU SKUs
- 9
- Models
- 9
- Avg uptime
- 97.71%
- Network
- <35ms
- Routing
- Hoonify
POOL · EU
~<55ms typicalEurope
- GPUs in pool
- 15,680
- GPU SKUs
- 7
- Models
- 5
- Avg uptime
- 98.06%
- Network
- <55ms
- Routing
- Hoonify
POOL · APAC
~<85ms typicalAPAC
- GPUs in pool
- 11,712
- GPU SKUs
- 8
- Models
- 4
- Avg uptime
- 97.63%
- Network
- <85ms
- Routing
- Hoonify
INFERENCE
Tokens, billed per million.
Llama, Qwen, DeepSeek, Mistral — quantized variants pooled across pools. OpenAI-compatible API. Try one in the workbench before you write a line of code.
Open the workbenchBARE-METAL GPUs
Pick the rack, not the wrapper.
H100, H200, MI300X, B200, L40S, 4090. Hourly, weekly, or reserved across every operator on the network.
Browse computeRESERVED CAPACITY
Lock the year, not the day.
Multi-month contracts on B200 and H200 capacity. Fixed pricing, dedicated racks, single contract across operators.
Talk to capacityFOR OPERATORS
Have the racks. Need the renters.
Hoonify is the full software stack for NeoCloud operators — TurbOS bare-metal provisioning, Kubernetes, inference scheduler, billing, and the Howl customer portal. The aggregator is the demand side. List your capacity once and it's in front of every customer in the network.
- ProvisioningTurbOS images bare metal in <90s.
- SchedulingInference scheduler & GPU sharing built-in.
- BillingPer-second metered. Credits, invoices, tax.