← Docs
Rate limits
Limits are primarily enforced per key. API responses that support rate-limit headers include remaining request and token counts so you can pace concurrency and retries.
Per-plan limits
| Plan | RPM | TPM |
|---|---|---|
| Free | 100 | 200K |
| Pay-as-you-go | 300 | 500K |
| Enterprise | 3,000+ | 5M+ |
RPM = requests/min; TPM = tokens/min (estimated input + output). Check the dashboard for a key's effective limits and overrides. Need higher limits? Email sales@sealink.io.
What is actually limited
| Layer | Meaning |
|---|---|
| RPM | Requests per minute. One API call counts once, streaming or not. |
| TPM | Tokens per minute. SeaLink reserves estimated input/output tokens first, then settles on actual usage. |
| Daily safety caps | Some accounts or keys may have daily call/token safety caps to contain leaks or abnormal traffic. |
| Body size | Oversized single requests are not 429s. Chat / Anthropic messages allow 10 MB; Embeddings / Rerank allow 1 MB. Oversized requests return 413. |
Response headers
| Header | Meaning |
|---|---|
| X-RateLimit-Limit-Requests | Per-minute RPM ceiling |
| X-RateLimit-Remaining-Requests | Remaining for this minute |
| X-RateLimit-Reset-Requests | Seconds until reset |
| Retry-After | Returned on 429; wait at least this many seconds |
Do not confuse 429 and 413
- 429 = requests or token reservations are arriving too fast. Respect Retry-After, reduce concurrency, or queue retries.
- 413 = a single request is too large. It may exceed the API request size limit or the model context window.
- For long documents, do not send the entire corpus in one request. Chunk the text, retrieve with embeddings, then send only relevant chunks into chat.
- If many questions reuse the same long document, consider prompt caching so repeated context is not billed at full cost every time.
Recommended retry pattern
Use exponential backoff with a small random delay, capped at 5-6 attempts. 429s are recorded as failed requests and normally do not create successful model-usage records.
Python
import time, randomfrom openai import OpenAI, APIStatusErrorclient = OpenAI(base_url="https://test.sealink.io/v1", api_key="<your-sealink-key>")def call_with_backoff(messages, model="qwen3.6-35b-a3b", max_attempts=6):for attempt in range(max_attempts):try:return client.chat.completions.create(model=model, messages=messages)except APIStatusError as e:if e.status_code != 429 or attempt == max_attempts - 1:raisesleep_s = min(60, 2 ** attempt) + random.random()time.sleep(sleep_s)
Production checklist
- Use a server-side queue or concurrency pool. Do not let frontend clicks directly fan out unlimited traffic.
- Retry only requests that are safe to repeat. If a request triggers external actions, deduplicate it in your own business system first.
- Tag each business workflow with metadata.task_type so usage, cost, and rate-limit pressure can be reviewed by task.