OperatorsNeoCloudIndustry

What a NeoCloud actually is, and why it matters for inference pricing

The term gets used loosely. We use it specifically: an operator that sells GPUs as a primary product, not as a side effect of a broader cloud business. The distinction shapes everything about how prices form.

Elena LiangOperator Partnerships· March 11, 20267 min read

When customers ask why Hoonify pricing beats hyperscaler pricing, the honest answer is structural: hyperscaler GPU pricing has to fund the hyperscaler. NeoCloud GPU pricing only has to fund the GPU.

A NeoCloud is an operator whose primary product is GPU compute. They build and run dedicated GPU clusters, sell capacity by the hour or by contract, and live or die based on utilization. They are not subsidizing a search business or a productivity suite. The GPU is the business.

Why this changes the price

Three reasons. First, capital structure: NeoCloud operators raise capital specifically to buy GPUs and have to pay it back from GPU revenue. They cannot afford to leave 50% of their fleet idle, so they price aggressively to keep utilization up.

Second, overhead: a hyperscaler GPU instance carries an allocation of the hyperscaler's networking, security, support, and sales overhead. A NeoCloud GPU carries the operator's overhead, which is structurally smaller.

Third, time horizon: hyperscalers price for multi-year customer relationships across many products. NeoCloud operators price for the hour because that is the unit they actually sell.

Why customers do not just go direct

Most customers cannot. The operator landscape is fragmented — there are dozens of credible NeoCloud operators, with capacity scattered across regions, SKUs, and contract types. Integrating directly means signing eighteen MSAs, building eighteen API integrations, and reconciling eighteen invoices every month.

The customers who do go direct are the ones large enough to negotiate dedicated capacity with a single operator. For everyone else, an aggregator pays for itself the first time you have to fail over from one operator to another.

The NeoCloud price advantage is real and structural. The integration cost of capturing it is the reason aggregators exist.

Where the model breaks

The NeoCloud model works because GPUs depreciate slower than the model assumes. If H100 useful life is five years, the unit economics are excellent. If the next-generation accelerator makes H100 obsolete in three, several of today's NeoCloud operators are in trouble.

We do not think this is the most likely scenario, but it is the scenario worth watching. Operator consolidation in 2027-2028 would change the price floor on everything we sell.

NewerStructured output without grammars: when constrained decoding hurts more than it helpsGrammar-constrained decoding is sold as a way to guarantee valid JSON. It also degrades model quality in ways that do not show up until you are debugging a production incident.Older RAG on Llama-3.3 beats frontier models at half the cost. Here is why.When the answer is in your documents, the model's job is to read carefully and synthesize. That is a job Llama-3.3 is good at, and where frontier intelligence is not the bottleneck.

Back to all posts