API REFERENCE · EMBEDDINGS
Embeddings
OpenAI-compatible embeddings endpoint. Use this for retrieval (RAG), semantic search, clustering, and classification.
POST
https://api.hoonify.dev/v1/embeddingsRequest body
| Field | Type | Description |
|---|---|---|
| model | string · required | One of the embedding models listed below. |
| input | string · array · required | A single string or up to 2,048 strings. Each string is capped at the model's context length. |
| encoding_format | string | float (default) returns a JSON array of floats. base64 returns a packed byte string — ~4× smaller payload, decode to float32. |
| dimensions | integer | Hoonify extension. For Matryoshka-trained models, truncate the returned vector to this many dimensions. Ignored on models without MRL. |
| user | string | Stable identifier for end-user attribution in usage records. |
Available models
| Model ID | Dim | Context | Notes |
|---|---|---|---|
| bge-large-en-v1.5 | 1024 | 512 | General English. Best default. |
| bge-m3 | 1024 | 8192 | Multilingual, long context. |
| nomic-embed-text-v1.5 | 768 | 8192 | Open-license, smaller vectors. |
| jina-embeddings-v3 | 1024 | 8192 | Strong on retrieval benchmarks. |
Example
json
{
"model": "bge-large-en-v1.5",
"input": [
"Hoonify routes inference to NeoCloud operators.",
"Pools group operators by region: na, eu, apac."
],
"encoding_format": "float"
}Response
json
{
"object": "list",
"model": "bge-large-en-v1.5",
"pool": "na",
"data": [
{ "object": "embedding", "index": 0, "embedding": [0.0124, -0.0331, …] },
{ "object": "embedding", "index": 1, "embedding": [0.0192, -0.0287, …] }
],
"usage": { "prompt_tokens": 26, "total_tokens": 26 }
}Vectors come back in the same order as the input array. Length matches the model'sdim column above (or your dimensions override for MRL models).
Batch sizing
Send up to 2,048 inputs per request. Latency is roughly flat from 1 to ~64 inputs, then grows linearly. For large indexing jobs, batch around 64 and parallelize.
Pricing
Embeddings are billed per input token. Output dimensions don't affect price. See the rate card for current per-model rates.
Related: Chat completions · Quantization