Run open-source inference on real GPUs, on real terms.

DeepSeek, Qwen, Gemma, Kimi — open weights pooled across every NeoCloud running Hoonify, with an OpenAI-compatible API.

Try a model now Create account · 1M Tokens Free Why open source?

INFERENCE

Tokens, billed per million.

DeepSeek, Qwen, Gemma, Kimi — open models pooled across the network. Try one in the workbench before you write a line of code.

Open the workbench

DROP-IN API

Change one base URL.

OpenAI-compatible endpoints. Point your existing SDK at Hoonify and keep your code — no rewrite, no migration project.

Read the docs

HONEST PRICING

Real prices, no surge.

The rate you see is the rate you pay — 10–50× cheaper than closed APIs for most non-frontier workloads.

See the math

WHAT ARE YOU BUILDING?

Start with the model that fits the job.

A starting point for the most common workloads — swap to any other model from the same base URL.

Explore use cases

RAG

Retrieval and grounded Q&A over your own docs — fast, cheap, and accurate enough to ship.

Start withQwen 3.6 27B

Code

Autocomplete, refactors, and agentic coding — frontier-level reasoning on open weights.

Start withDeepSeek V4 Pro

Support

Customer chat and ticket deflection at scale, without the closed-API invoice.

Start withGemma 4 31B

Agents

Long-horizon tool use and multi-step workflows that hold context across the run.

Start withKimi K2

HOW IT WORKS

From zero to inference in three steps.

Pick a model

Browse the catalog or try one live in the workbench. No commitment, no credit card.

Point your SDK at us

Swap in the OpenAI-compatible base URL and an API key. Your existing code keeps working.

Pay per token

Per-million-token pricing with no surge. Every new account starts on 1M free tokens.

FOR OPERATORS

Have the GPUs? We bring the demand.

List your capacity once and it's in front of every customer on the network. We route the inference demand, handle metering and billing, and keep your GPUs earning — you keep running the hardware.

List your capacity

Demand
Qualified inference traffic, routed to your GPUs.
Utilization
Fill idle hours across the pool, not just your own customers.
Settlement
Metered billing, invoices, and payouts handled for you.