PolicyInfrastructurePowerWater

Power, water, and the limits on inference scaling in 2026

The constraint on inference is not GPU supply anymore. It is interconnect at the rack level and water at the regional level. Both are showing up in our pricing.

Connor BrownCEO, Hoonify· February 11, 20268 min read

For most of 2024 the bottleneck on AI infrastructure was GPU supply. By late 2025 it was rack-level networking. In 2026 it is the building itself: power delivery, water for cooling, and the regulatory environment that gates both.

This does not affect inference customers directly today. It does set the trajectory for the next 18-24 months and it is the reason capacity in some pools is meaningfully more expensive than capacity in others.

Why power is hard

An H100 SXM module pulls 700W. A rack of them pulls 35-50kW. A row of those racks pulls megawatts. The local utility connection that supplies that power was sized for a building that pulled tens of kilowatts, not tens of megawatts. Upgrading the connection requires a substation, which requires permitting, which requires a queue.

Power queue times in major US metros now average 3-5 years. Operators with energized capacity have a moat that did not exist three years ago.

Why water is harder

Liquid cooling is the only thermally viable option at current rack densities. Liquid cooling needs a water source. In some metros — Phoenix, Las Vegas, parts of Texas — water rights are now the gating factor, not power. We have seen multi-hundred-million-dollar projects pause because the water permit did not clear.

The regulatory question is whether AI workloads are 'productive use' of municipal water. The answer varies by jurisdiction. We expect more litigation here, not less.

How it shows up in our pricing

Three of our pools — North America, Europe, and APAC — have meaningfully different price floors. North America is the cheapest because of historical buildout and abundant power in specific regions (Virginia, Texas, the Pacific Northwest). Europe is more expensive because power is more expensive almost everywhere. APAC varies wildly because it averages Tokyo, Singapore, and Sydney — three very different infrastructure stories.

Customers who care about price more than latency should default to North America. Customers who care about regulatory compliance more than price should accept the European premium. Customers who care about latency to APAC end users have to pay APAC pricing.

What we are watching

Two things. First, modular nuclear — small reactors are starting to show up in serious operator buildout plans. Whether they materialize on schedule is one of the bigger open questions for late-2027 capacity. Second, immersion cooling at scale — if a major operator proves immersion cooling works in production at rack density, water-constrained metros open back up.

Neither is a 2026 story. Both are reasons to be optimistic about price floors a few years out.

NewerDeepSeek, Qwen, Llama: how we choose between them on a per-workload basisThere is no best open-weights model in 2026. There are three good ones, each strongest at a different shape of work. Here is how we route between them.Older Why we ship every new account with $50 of credits and no credit cardAn onboarding bet, said plainly: we believe more people should try open-source inference before they decide whether to use it. Removing the credit card is the cheapest way to make that happen.

Back to all posts