Concepts
A shared vocabulary for product, engineering, and operations teams building with AI APIs.
Model
The AI system your request is routed to. Different models are built for different tasks.
- Model choice affects quality, latency, context length, input types, and cost. Check each model's detail page before using it in production.
- SeaLink lets you test and switch across supported model providers through one API contract.
When it matters: When you choose the right tradeoff between quality, latency, context length, and cost.
Token
The unit models use to count input and output text. Token counts vary by language and model tokenizer.
- Short English and Chinese phrases may tokenize differently. Use a tokenizer or a small test call when estimating production cost.
- For text models, both input and output tokens may be billed. Input and output rates can differ by model.
- Use request logs and usage records to compare estimated cost with actual token usage.
When it matters: Whenever you forecast monthly cost or hit a model's context limit.
Context window
The maximum amount of input, history, and expected output a model can handle in one request.
- Context length differs by model. Check the model detail page before sending long documents or multi-turn agent history.
- If the request exceeds the route or model limit, the API returns 413. Trim history, chunk the document, or switch to a longer-context model.
When it matters: When you summarize long documents, do whole-codebase analysis, or run multi-turn agents.
API Key
A secret string that authenticates your app with SeaLink.
- Treat it like a password. Do not commit it to Git, paste it into chat tools, or expose it in frontend code.
- Use separate keys for development, production, and team-owned services so access and spending can be reviewed separately.
- If a key leaks, create a replacement key, update your application, then revoke the old key from /dashboard/keys.
When it matters: Every time you ship code that calls SeaLink.
RAG (Retrieval-Augmented Generation)
Pattern: retrieve relevant docs first, then ask the model to answer using them as context.
- Step 1: split your documents into chunks and turn those chunks into embeddings.
- Step 2: when a user asks a question, embed the query and retrieve the most relevant chunks.
- Step 3: send the retrieved chunks and the question to a chat model.
- Use RAG when answers need to be grounded in your own documents, policies, support tickets, or knowledge base.
When it matters: When answers need to use your source material rather than only the model's general knowledge.
Tool use / Function calling
Letting a compatible model request a structured action that your application executes.
- You define the tool name, description, and JSON parameters. The model returns the tool call and arguments.
- Your server executes the action, sends the result back to the model, and decides what to expose to the end user.
- Support varies by model. Use /docs/function-calling and the model detail page before relying on tools in production.
When it matters: Agents, support automation, workflow tools, and any product that mixes natural language with business actions.
Streaming
Receive partial output as the model generates it instead of waiting for the complete response.
- Streaming improves perceived responsiveness for chat UIs and long-form generation, especially when users are watching the output.
- Enable it with stream: true and consume Server-Sent Events through your SDK or HTTP client.
When it matters: User-facing chat, coding assistants, and long generation. It is less useful for short background jobs.
Prompt caching
Reuse a long prompt prefix across calls so supported providers can discount or optimize repeated context.
- Common use cases include long system prompts, repeated document context, and agents that reuse the same instruction block.
- Cache behavior, TTL, and billing depend on the selected model provider. SeaLink forwards supported cache controls and records cached token usage when returned.
When it matters: High-volume workflows that repeatedly send the same long context.