SeaLinkSeaLink
/
← Docs

Rate limits

Limits are primarily enforced per key. API responses that support rate-limit headers include remaining request and token counts so you can pace concurrency and retries.

Per-plan limits

PlanRPMTPM
Free100200K
Pay-as-you-go300500K
Enterprise3,000+5M+

RPM = requests/min; TPM = tokens/min (estimated input + output). Check the dashboard for a key's effective limits and overrides. Need higher limits? Email sales@sealink.io.

What is actually limited

LayerMeaning
RPMRequests per minute. One API call counts once, streaming or not.
TPMTokens per minute. SeaLink reserves estimated input/output tokens first, then settles on actual usage.
Daily safety capsSome accounts or keys may have daily call/token safety caps to contain leaks or abnormal traffic.
Body sizeOversized single requests are not 429s. Chat / Anthropic messages allow 10 MB; Embeddings / Rerank allow 1 MB. Oversized requests return 413.

Response headers

HeaderMeaning
X-RateLimit-Limit-RequestsPer-minute RPM ceiling
X-RateLimit-Remaining-RequestsRemaining for this minute
X-RateLimit-Reset-RequestsSeconds until reset
Retry-AfterReturned on 429; wait at least this many seconds

Do not confuse 429 and 413

  • 429 = requests or token reservations are arriving too fast. Respect Retry-After, reduce concurrency, or queue retries.
  • 413 = a single request is too large. It may exceed the API request size limit or the model context window.
  • For long documents, do not send the entire corpus in one request. Chunk the text, retrieve with embeddings, then send only relevant chunks into chat.
  • If many questions reuse the same long document, consider prompt caching so repeated context is not billed at full cost every time.

Recommended retry pattern

Use exponential backoff with a small random delay, capped at 5-6 attempts. 429s are recorded as failed requests and normally do not create successful model-usage records.

Python
import time, random
from openai import OpenAI, APIStatusError
client = OpenAI(base_url="https://test.sealink.io/v1", api_key="<your-sealink-key>")
def call_with_backoff(messages, model="qwen3.6-35b-a3b", max_attempts=6):
for attempt in range(max_attempts):
try:
return client.chat.completions.create(model=model, messages=messages)
except APIStatusError as e:
if e.status_code != 429 or attempt == max_attempts - 1:
raise
sleep_s = min(60, 2 ** attempt) + random.random()
time.sleep(sleep_s)

Production checklist

  • Use a server-side queue or concurrency pool. Do not let frontend clicks directly fan out unlimited traffic.
  • Retry only requests that are safe to repeat. If a request triggers external actions, deduplicate it in your own business system first.
  • Tag each business workflow with metadata.task_type so usage, cost, and rate-limit pressure can be reviewed by task.