Rate limits

Limits are primarily enforced per key. API responses that support rate-limit headers include remaining request and token counts so you can pace concurrency and retries.

Per-plan limits

Plan	RPM	TPM
Free	100	200K
Pay-as-you-go	300	500K
Enterprise	3,000+	5M+

RPM = requests/min; TPM = tokens/min (estimated input + output). Check the dashboard for a key's effective limits and overrides. Need higher limits? Email sales@sealink.io.

What is actually limited

Layer	Meaning
RPM	Requests per minute. One API call counts once, streaming or not.
TPM	Tokens per minute. SeaLink reserves estimated input/output tokens first, then settles on actual usage.
Daily safety caps	Some accounts or keys may have daily call/token safety caps to contain leaks or abnormal traffic.
Body size	Oversized single requests are not 429s. Chat / Anthropic messages allow 10 MB; Embeddings / Rerank allow 1 MB. Oversized requests return 413.

Response headers

Header	Meaning
X-RateLimit-Limit-Requests	Per-minute RPM ceiling
X-RateLimit-Remaining-Requests	Remaining for this minute
X-RateLimit-Reset-Requests	Seconds until reset
Retry-After	Returned on 429; wait at least this many seconds

Do not confuse 429 and 413

429 = requests or token reservations are arriving too fast. Respect Retry-After, reduce concurrency, or queue retries.
413 = a single request is too large. It may exceed the API request size limit or the model context window.
For long documents, do not send the entire corpus in one request. Chunk the text, retrieve with embeddings, then send only relevant chunks into chat.
If many questions reuse the same long document, consider prompt caching so repeated context is not billed at full cost every time.

Recommended retry pattern

Use exponential backoff with a small random delay, capped at 5-6 attempts. 429s are recorded as failed requests and normally do not create successful model-usage records.

Python

import os
import time, random
from openai import OpenAI, APIStatusError

client = OpenAI(base_url="https://test.sealink.io/v1", api_key=os.environ["SEALINK_API_KEY"])

def call_with_backoff(messages, model="deepseek-v4-pro", max_attempts=6):
    for attempt in range(max_attempts):
        try:
            return client.chat.completions.create(model=model, messages=messages)
        except APIStatusError as e:
            if e.status_code != 429 or attempt == max_attempts - 1:
                raise
            sleep_s = min(60, 2 ** attempt) + random.random()
            time.sleep(sleep_s)

Production checklist

Use a server-side queue or concurrency pool. Do not let frontend clicks directly fan out unlimited traffic.
Retry only requests that are safe to repeat. If a request triggers external actions, deduplicate it in your own business system first.
Tag each business workflow with metadata.task_type so usage, cost, and rate-limit pressure can be reviewed by task.