AI INFERENCE PLATFORM // OPEN-SOURCE FIRST
Run open-source inference on real GPUs, on real terms.
DeepSeek, Qwen, Gemma, Kimi — open weights pooled across every NeoCloud running Hoonify, with an OpenAI-compatible API.
INFERENCE
Tokens, billed per million.
DeepSeek, Qwen, Gemma, Kimi — open models pooled across the network. Try one in the workbench before you write a line of code.
Open the workbenchDROP-IN API
Change one base URL.
OpenAI-compatible endpoints. Point your existing SDK at Hoonify and keep your code — no rewrite, no migration project.
Read the docsHONEST PRICING
Real prices, no surge.
The rate you see is the rate you pay — 10–50× cheaper than closed APIs for most non-frontier workloads.
See the mathWHAT ARE YOU BUILDING?
Start with the model that fits the job.
A starting point for the most common workloads — swap to any other model from the same base URL.
RAG
Retrieval and grounded Q&A over your own docs — fast, cheap, and accurate enough to ship.
Code
Autocomplete, refactors, and agentic coding — frontier-level reasoning on open weights.
Support
Customer chat and ticket deflection at scale, without the closed-API invoice.
Agents
Long-horizon tool use and multi-step workflows that hold context across the run.
HOW IT WORKS
From zero to inference in three steps.
Pick a model
Browse the catalog or try one live in the workbench. No commitment, no credit card.
Point your SDK at us
Swap in the OpenAI-compatible base URL and an API key. Your existing code keeps working.
Pay per token
Per-million-token pricing with no surge. Every new account starts on 1M free tokens.
FOR OPERATORS
Have the GPUs? We bring the demand.
List your capacity once and it's in front of every customer on the network. We route the inference demand, handle metering and billing, and keep your GPUs earning — you keep running the hardware.
- DemandQualified inference traffic, routed to your GPUs.
- UtilizationFill idle hours across the pool, not just your own customers.
- SettlementMetered billing, invoices, and payouts handled for you.