GET STARTED · RATE LIMITS

Rate limits

Hoonify enforces two limits per API key, both on a rolling 60-second window: requests-per-minute (RPM) and tokens-per-minute (TPM). Whichever is hit first triggers 429 rate_limited.

Tiers

Tier	RPM	TPM	Notes
Tier 1	300	150K	New orgs, first 30 days. Auto-promotes on first invoice.
Tier 2	1,200	600K	Default after first paid invoice.
Tier 3	12,000	6M	Email support@hoonify.dev with use case.
Tier 4	60,000	30M	Reservation customers, contracted.

Limits are per-key, not per-org — split heavy workloads across multiple keys for burst headroom. The org-level cap is the sum of all live keys plus a 20% burst allowance.

Response headers

Every response (including 200s) carries the current limit state:

Header	Value
X-RateLimit-Limit-Requests	RPM cap on this key, integer.
X-RateLimit-Limit-Tokens	TPM cap on this key, integer.
X-RateLimit-Remaining-Requests	Requests left in the current 60-second window.
X-RateLimit-Remaining-Tokens	Tokens left in the current 60-second window.
X-RateLimit-Reset-Requests	Seconds until the request window resets.
X-RateLimit-Reset-Tokens	Seconds until the token window resets.
Retry-After	On 429 only. Seconds until the next attempt is safe.

Handling 429

Honor Retry-After (in seconds), apply jitter, and back off exponentially. Five attempts with jitter is enough for any transient burst — beyond that the limit is structural and you should request a tier bump rather than retry harder.

python

import time, random
from openai import OpenAI, RateLimitError

client = OpenAI(base_url="https://api.hoonify.dev/v1")

def with_retry(fn, max_attempts=5):
    for attempt in range(max_attempts):
        try:
            return fn()
        except RateLimitError as e:
            wait = float(e.response.headers.get("Retry-After", 1))
            wait += random.uniform(0, 0.5)  # jitter
            time.sleep(wait * (2 ** attempt))
    raise RuntimeError("rate-limit retry exhausted")

What counts as a token?

Hoonify counts both prompt and completion tokens against the TPM window. Streamed responses charge tokens as they emit, not all at once — useful when you're running long completions near the cap.

Burst behavior

Within a 60-second window, the first 25% of your RPM is allowed in a single burst (token-bucket style). The remainder smooths over the rest of the window. Practically: a Tier 2 key (1,200 RPM) can fire 300 requests in the first second without hitting the limit, then must average 15 RPS through the rest of the minute.

Tier bumps

Tier 3 is zero-touch — email support@hoonify.dev with the org name, the key prefix(es) that need bumping, and a one-line use case description. Most bumps are processed within an hour during business hours, same business day otherwise. Tier 4 is contracted; talk to your account contact.

Related: Authentication · Chat completions