SeaLinkSeaLink
/
← Docs

Audio

Audio endpoints cover speech synthesis, transcription, translation, music, sound effects, and voice-related tasks. Formats, duration limits, file sizes, and billing units vary by model.

POST /v1/audio/speech

Text-to-speech synthesis. Voice choices, formats, and billing units vary by TTS model.

Parameters

ParameterTypeRequiredDescription
modelstringYesqwen3-tts-flash, qwen3-tts-instruct-flash
inputstringYesText to synthesize, max 4096 chars
voicestringNoModel-specific; qwen3-tts-flash example uses Cherry
response_formatstringNomp3, opus, aac, flac, wav, pcm (default: mp3)
speednumberNo0.25 - 4.0 (default: 1.0)
streambooleanNoEnable SSE streaming audio
cURL
curl https://test.sealink.io/v1/audio/speech \
-H "Authorization: Bearer <your-sealink-key>" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-tts-flash",
"input": "Hello! Welcome to SeaLink Audio API.",
"voice": "Cherry",
"speed": 1.0,
"response_format": "mp3"
}' \
--output speech.mp3
Python
from openai import OpenAI
client = OpenAI(
base_url="https://test.sealink.io/v1",
api_key="<your-sealink-key>",
)
with client.audio.speech.with_streaming_response.create(
model="qwen3-tts-flash",
voice="Cherry",
input="Hello! Welcome to SeaLink Audio API.",
speed=1.0,
) as response:
response.stream_to_file("speech.mp3")

POST /v1/audio/transcriptions

Speech-to-text. Transcribe uploaded files or public audio URLs with multiple output formats, timestamps, and sync or async task flows.

Parameters

ParameterTypeRequiredDescription
filefileEitherAudio file, max 25MB; use file_url for file-transcription Alibaba ASR models
file_urlstringEitherPublicly accessible HTTP(S) audio URL; JSON file_urls long jobs return 202 and are polled via /v1/tasks
modelstringYesqwen3-asr-flash, qwen3-asr-flash-filetrans, fun-asr, whisper-large-v3
response_formatstringNojson, text, srt, verbose_json, vtt (default: json)
temperaturenumberNo0 - 1
languagestringNoISO-639-1 language code, e.g. en, zh
promptstringNoGuidance prompt, max 1024 chars
timestamp_granularitiesarrayNoword, segment
cURL
curl https://test.sealink.io/v1/audio/transcriptions \
-H "Authorization: Bearer <your-sealink-key>" \
-F "model=qwen3-asr-flash" \
-F "file_url=https://example.com/interview.mp3" \
-F "response_format=json" \
-F "language=en"
JSON
curl https://test.sealink.io/v1/audio/transcriptions \
-H "Authorization: Bearer <your-sealink-key>" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-asr-flash",
"file_url": "https://example.com/interview.mp3",
"response_format": "json",
"language": "en"
}'
Filetrans task
curl https://test.sealink.io/v1/audio/transcriptions \
-H "Authorization: Bearer <your-sealink-key>" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-asr-flash-filetrans",
"file_urls": ["https://example.com/long-audio.mp3"],
"language": "en"
}'
curl https://test.sealink.io/v1/tasks/<task_id> \
-H "Authorization: Bearer <your-sealink-key>"
Python
from openai import OpenAI
client = OpenAI(
base_url="https://test.sealink.io/v1",
api_key="<your-sealink-key>",
)
import requests
resp = requests.post(
"https://test.sealink.io/v1/audio/transcriptions",
headers={"Authorization": "Bearer <your-sealink-key>"},
data={
"model": "qwen3-asr-flash",
"file_url": "https://example.com/interview.mp3",
"response_format": "json",
"language": "en",
},
)
resp.raise_for_status()
print(resp.json()["text"])

POST /v1/audio/translations

Audio translation. Translate audio from any language into English text. Same parameters as transcriptions (language not supported).

Parameters

ParameterTypeRequiredDescription
filefileYesAudio file, max 25MB
modelstringYeswhisper-large-v3
response_formatstringNojson, text, srt, verbose_json, vtt
temperaturenumberNo0 - 1
promptstringNoGuidance prompt, max 1024 chars
cURL
curl https://test.sealink.io/v1/audio/translations \
-H "Authorization: Bearer <your-sealink-key>" \
-F "file=@korean_speech.mp3" \
-F "model=whisper-large-v3"
Python
from openai import OpenAI
client = OpenAI(
base_url="https://test.sealink.io/v1",
api_key="<your-sealink-key>",
)
audio_file = open("korean_speech.mp3", "rb")
translation = client.audio.translations.create(
model="whisper-large-v3",
file=audio_file,
)
print(translation.text)

POST /v1/audio/music

AI music generation. Use a currently public music model from your catalog snapshot. Generate music with or without lyrics.

Parameters

ParameterTypeRequiredDescription
modelstringYesudio-v2
promptstringYesMusic description prompt
durationintegerNo1 - 600 seconds
nintegerNo1 - 4
stylestringNoStyle description, max 256 chars
lyricsstringNoLyrics, max 4096 chars
instrumentalbooleanNoInstrumental only (no lyrics)
cURL
curl https://test.sealink.io/v1/audio/music \
-H "Authorization: Bearer <your-sealink-key>" \
-H "Content-Type: application/json" \
-d '{
"model": "udio-v2",
"prompt": "An uplifting orchestral piece with piano and strings, 120 BPM",
"duration": 180,
"n": 1,
"style": "cinematic orchestral",
"instrumental": true
}'
Python
from openai import OpenAI
client = OpenAI(
base_url="https://test.sealink.io/v1",
api_key="<your-sealink-key>",
)
# Custom endpoint – use raw requests
import requests
res = requests.post(
"https://test.sealink.io/v1/audio/music",
headers={"Authorization": "Bearer <your-sealink-key>"},
json={
"model": "udio-v2",
"prompt": "An uplifting orchestral piece, 120 BPM",
"duration": 180,
"n": 1,
"instrumental": True,
},
)
print(res.json())

POST /v1/audio/sound-effects

AI sound effects generation. Describe a sound in natural language and get a short audio clip. Ideal for games, film, podcasts.

Parameters

ParameterTypeRequiredDescription
modelstringYeselevenlabs-sfx
promptstringYesSound effect description
duration_secondsintegerNo1 - 30 (seconds)
nintegerNo1 - 4
cURL
curl https://test.sealink.io/v1/audio/sound-effects \
-H "Authorization: Bearer <your-sealink-key>" \
-H "Content-Type: application/json" \
-d '{
"model": "elevenlabs-sfx",
"prompt": "Heavy rain on a tin roof with distant thunder",
"duration_seconds": 10,
"n": 1
}'
Python
import requests
res = requests.post(
"https://test.sealink.io/v1/audio/sound-effects",
headers={"Authorization": "Bearer <your-sealink-key>"},
json={
"model": "elevenlabs-sfx",
"prompt": "Heavy rain on a tin roof with distant thunder",
"duration_seconds": 10,
"n": 1,
},
)
print(res.json())

POST /v1/audio/voice-clone

Voice cloning. Provide a reference audio and text to generate matching speech. Create custom voices or digital avatars.

Parameters

ParameterTypeRequiredDescription
modelstringYeselevenlabs-voice-clone
audiostringYesReference audio as base64 or URL
voice_namestringNoCustom voice name, max 128 chars
textstringNoText to speak, max 4096 chars
response_formatstringNomp3, wav, ogg (default: mp3)
cURL
curl https://test.sealink.io/v1/audio/voice-clone \
-H "Authorization: Bearer <your-sealink-key>" \
-H "Content-Type: application/json" \
-d '{
"model": "elevenlabs-voice-clone",
"audio": "https://example.com/reference.wav",
"voice_name": "my-custom-voice",
"text": "This is a test of my cloned voice.",
"response_format": "mp3"
}' \
--output cloned.mp3
Python
import requests
res = requests.post(
"https://test.sealink.io/v1/audio/voice-clone",
headers={"Authorization": "Bearer <your-sealink-key>"},
json={
"model": "elevenlabs-voice-clone",
"audio": "https://example.com/reference.wav",
"voice_name": "my-custom-voice",
"text": "This is a test of my cloned voice.",
"response_format": "mp3",
},
)
# Response is audio binary
with open("cloned.mp3", "wb") as f:
f.write(res.content)