API REFERENCE · EMBEDDINGS

Embeddings

OpenAI-compatible embeddings endpoint. Use this for retrieval (RAG), semantic search, clustering, and classification.

POSThttps://api.hoonify.dev/v1/embeddings

Request body

Field	Type	Description
model	string · required	One of the embedding models listed below.
input	string · array · required	A single string or up to 2,048 strings. Each string is capped at the model's context length.
encoding_format	string	`float` (default) returns a JSON array of floats. `base64` returns a packed byte string — ~4× smaller payload, decode to `float32`.
dimensions	integer	Hoonify extension. For Matryoshka-trained models, truncate the returned vector to this many dimensions. Ignored on models without MRL.
user	string	Stable identifier for end-user attribution in usage records.

Available models

Model ID	Dim	Context	Notes
bge-large-en-v1.5	1024	512	General English. Best default.
bge-m3	1024	8192	Multilingual, long context.
nomic-embed-text-v1.5	768	8192	Open-license, smaller vectors.
jina-embeddings-v3	1024	8192	Strong on retrieval benchmarks.

Example

json

{
  "model": "bge-large-en-v1.5",
  "input": [
    "Hoonify routes inference to NeoCloud operators.",
    "Pools group operators by region: na, eu, apac."
  ],
  "encoding_format": "float"
}

Response

json

{
  "object": "list",
  "model": "bge-large-en-v1.5",
  "pool": "na",
  "data": [
    { "object": "embedding", "index": 0, "embedding": [0.0124, -0.0331, …] },
    { "object": "embedding", "index": 1, "embedding": [0.0192, -0.0287, …] }
  ],
  "usage": { "prompt_tokens": 26, "total_tokens": 26 }
}

Vectors come back in the same order as the input array. Length matches the model'sdim column above (or your dimensions override for MRL models).

Batch sizing

Send up to 2,048 inputs per request. Latency is roughly flat from 1 to ~64 inputs, then grows linearly. For large indexing jobs, batch around 64 and parallelize.

Pricing

Embeddings are billed per input token. Output dimensions don't affect price. See the rate card for current per-model rates.

Related: Chat completions · Quantization