CONCEPTS · POOLS
Pools
Hoonify groups every operator into one of three pools: North America, Europe, and APAC. From your perspective each pool is a single, fungible compute resource — Hoonify decides which underlying operator handles a given request.
Why pools?
Routing
For each request, Hoonify picks an operator within the target pool by:
- Filtering to operators that have the requested SKU available right now.
- Among those, preferring lowest network latency to the request origin.
- Breaking ties by reliability score (rolling 30-day uptime).
- Avoiding operators that are draining or in maintenance windows.
If no capacity exists in the requested pool, Hoonify can either fail with a 409 no_capacity or fall back to the next-closest pool — controlled by the X-Hoonify-Pool-Fallback header. Default is strict (no fallback) for predictable latency.
Picking a pool
| Pool | Typical latency from | Coverage |
|---|---|---|
| NA | US, Canada, parts of LATAM | 9 SKUs, 9 inference models, fastest spot turnover. |
| EU | EU, UK, Nordics, Middle East | 7 SKUs, 5 models. GDPR-resident by default. |
| APAC | JP, KR, SG, AU, IN | 8 SKUs, 4 models. Strongest growth pool. |
Pinning a pool
Set the X-Hoonify-Pool header on the request. Use this when:
- You need data residency (e.g. EU customers must hit the EU pool).
- You're benchmarking a specific region's tail latency.
- Your inference is part of a multi-step pipeline and you want everything in one pool.
curl https://api.hoonify.dev/v1/chat/completions \
-H "Authorization: Bearer $HOONIFY_API_KEY" \
-H "X-Hoonify-Pool: eu" \
-d '{ "model": "llama-3.3-70b", "messages": [...] }'Multi-pool replication
For latency-sensitive customer-facing workloads, configure your endpoint to replicate across pools — Hoonify will route each request to the closest replica:
POST /v1/endpoints
{
"model": "llama-3.3-70b",
"pools": ["na", "eu", "apac"],
"concurrency": 64
}Replication doesn't multiply your bill — the per-token rate is unchanged. You pay for tokens served across all pools, and we eat the fleet management.
What you don't see
The specific operators inside a pool are not exposed via the API. The system_fingerprint on the response identifies the model + runtime configuration but not the operator. This is deliberate: it gives Hoonify the flexibility to migrate workloads between operators without breaking your responses, and it keeps the operator list private.
Related: Quickstart · Chat completions API