EngineeringAPI designCompatibility

The OpenAI-compatible API is the wrong default and the right default

Every inference provider speaks OpenAI's wire format. That is a gift to customers and a tax on innovation. Both things are true.

Jules MaederInference Lead· March 25, 20266 min read

Hoonify's inference endpoints speak the OpenAI Chat Completions wire format. So does every serious open-source serving stack and every other open-weights API I can think of. This was not a planned outcome. It is the equilibrium the market arrived at because the alternative — every provider inventing its own JSON shape — is worse for everyone.

But the OpenAI shape was designed for OpenAI's models in 2023. It is a poor fit for what these models actually do in 2026.

What it gets right

Universal portability. A customer can move from OpenAI to Anthropic to a local Llama server to Hoonify by changing a base URL and an API key. That is enormous. It means open-source models compete on quality and price, not on integration cost. Without this convention, the open-weights ecosystem would still be losing on switching costs alone.

What it gets wrong

Tool calling. The current shape forces tools through a 'function call' mental model that does not match how modern models reason about external state. We work around it; everyone works around it; the workarounds drift apart.

Streaming. The Server-Sent Events format ships token-level deltas with rich nested objects that do not compose well with intermediate proxies. Half the bugs we field on streaming are someone's middlebox truncating a JSON delta.

Multimodal. The image content blocks were retrofitted onto a text-first schema. Audio came next. Video is going to break it. Each addition has been a backwards-compatible patch that pushes more of the parsing complexity onto the client.

The wire format is a fossil record of a 2023 product roadmap. We are all stuck with it because the alternative is worse.

What we are doing about it

Nothing dramatic. We support the OpenAI shape because customers expect it. We also expose a 'native' shape on the same endpoints that fixes the streaming and tool-calling sharp edges for customers willing to break compatibility. About 6% of our token volume goes through the native shape today, mostly from customers who control both ends of the integration.

We expect this split to stay roughly where it is. Universal portability is the more valuable property. Innovation happens at the edges.

NewerH100 spot pricing recap: Q1 2026The H100 spot market did something unusual this quarter: prices fell during the same window training demand spiked. Here is what we saw and what we think drove it.Older Structured output without grammars: when constrained decoding hurts more than it helpsGrammar-constrained decoding is sold as a way to guarantee valid JSON. It also degrades model quality in ways that do not show up until you are debugging a production incident.

Back to all posts