GET STARTED · RATE LIMITS
Rate limits
Hoonify enforces two limits per API key, both on a rolling 60-second window: requests-per-minute (RPM) and tokens-per-minute (TPM). Whichever is hit first triggers 429 rate_limited.
Tiers
| Tier | RPM | TPM | Notes |
|---|---|---|---|
| Tier 1 | 300 | 150K | New orgs, first 30 days. Auto-promotes on first invoice. |
| Tier 2 | 1,200 | 600K | Default after first paid invoice. |
| Tier 3 | 12,000 | 6M | Email support@hoonify.dev with use case. |
| Tier 4 | 60,000 | 30M | Reservation customers, contracted. |
Limits are per-key, not per-org — split heavy workloads across multiple keys for burst headroom. The org-level cap is the sum of all live keys plus a 20% burst allowance.
Response headers
Every response (including 200s) carries the current limit state:
| Header | Value |
|---|---|
| X-RateLimit-Limit-Requests | RPM cap on this key, integer. |
| X-RateLimit-Limit-Tokens | TPM cap on this key, integer. |
| X-RateLimit-Remaining-Requests | Requests left in the current 60-second window. |
| X-RateLimit-Remaining-Tokens | Tokens left in the current 60-second window. |
| X-RateLimit-Reset-Requests | Seconds until the request window resets. |
| X-RateLimit-Reset-Tokens | Seconds until the token window resets. |
| Retry-After | On 429 only. Seconds until the next attempt is safe. |
Handling 429
Honor Retry-After (in seconds), apply jitter, and back off exponentially. Five attempts with jitter is enough for any transient burst — beyond that the limit is structural and you should request a tier bump rather than retry harder.
import time, random
from openai import OpenAI, RateLimitError
client = OpenAI(base_url="https://api.hoonify.dev/v1")
def with_retry(fn, max_attempts=5):
for attempt in range(max_attempts):
try:
return fn()
except RateLimitError as e:
wait = float(e.response.headers.get("Retry-After", 1))
wait += random.uniform(0, 0.5) # jitter
time.sleep(wait * (2 ** attempt))
raise RuntimeError("rate-limit retry exhausted")What counts as a token?
Burst behavior
Within a 60-second window, the first 25% of your RPM is allowed in a single burst (token-bucket style). The remainder smooths over the rest of the window. Practically: a Tier 2 key (1,200 RPM) can fire 300 requests in the first second without hitting the limit, then must average 15 RPS through the rest of the minute.
Tier bumps
Tier 3 is zero-touch — email support@hoonify.dev with the org name, the key prefix(es) that need bumping, and a one-line use case description. Most bumps are processed within an hour during business hours, same business day otherwise. Tier 4 is contracted; talk to your account contact.
Related: Authentication · Chat completions