API REFERENCE · EMBEDDINGS

Embeddings

OpenAI-compatible embeddings endpoint. Use this for retrieval (RAG), semantic search, clustering, and classification.

POSThttps://api.hoonify.dev/v1/embeddings

Request body

FieldTypeDescription
modelstring · requiredOne of the embedding models listed below.
inputstring · array · requiredA single string or up to 2,048 strings. Each string is capped at the model's context length.
encoding_formatstringfloat (default) returns a JSON array of floats. base64 returns a packed byte string — ~4× smaller payload, decode to float32.
dimensionsintegerHoonify extension. For Matryoshka-trained models, truncate the returned vector to this many dimensions. Ignored on models without MRL.
userstringStable identifier for end-user attribution in usage records.

Available models

Model IDDimContextNotes
bge-large-en-v1.51024512General English. Best default.
bge-m310248192Multilingual, long context.
nomic-embed-text-v1.57688192Open-license, smaller vectors.
jina-embeddings-v310248192Strong on retrieval benchmarks.

Example

json
{
  "model": "bge-large-en-v1.5",
  "input": [
    "Hoonify routes inference to NeoCloud operators.",
    "Pools group operators by region: na, eu, apac."
  ],
  "encoding_format": "float"
}

Response

json
{
  "object": "list",
  "model": "bge-large-en-v1.5",
  "pool": "na",
  "data": [
    { "object": "embedding", "index": 0, "embedding": [0.0124, -0.0331, …] },
    { "object": "embedding", "index": 1, "embedding": [0.0192, -0.0287, …] }
  ],
  "usage": { "prompt_tokens": 26, "total_tokens": 26 }
}

Vectors come back in the same order as the input array. Length matches the model'sdim column above (or your dimensions override for MRL models).

Batch sizing

Send up to 2,048 inputs per request. Latency is roughly flat from 1 to ~64 inputs, then grows linearly. For large indexing jobs, batch around 64 and parallelize.

Pricing

Embeddings are billed per input token. Output dimensions don't affect price. See the rate card for current per-model rates.

Related: Chat completions · Quantization