Audio
Audio endpoints cover speech synthesis, transcription, translation, music, sound effects, and voice-related tasks. Formats, duration limits, file sizes, and billing units vary by model.
POST /v1/audio/speech
Text-to-speech synthesis. Voice choices, formats, and billing units vary by TTS model.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | qwen3-tts-flash, qwen3-tts-instruct-flash |
| input | string | Yes | Text to synthesize, max 4096 chars |
| voice | string | No | Model-specific; qwen3-tts-flash example uses Cherry |
| response_format | string | No | mp3, opus, aac, flac, wav, pcm (default: mp3) |
| speed | number | No | 0.25 - 4.0 (default: 1.0) |
| stream | boolean | No | Enable SSE streaming audio |
curl https://test.sealink.io/v1/audio/speech \-H "Authorization: Bearer <your-sealink-key>" \-H "Content-Type: application/json" \-d '{"model": "qwen3-tts-flash","input": "Hello! Welcome to SeaLink Audio API.","voice": "Cherry","speed": 1.0,"response_format": "mp3"}' \--output speech.mp3
from openai import OpenAIclient = OpenAI(base_url="https://test.sealink.io/v1",api_key="<your-sealink-key>",)with client.audio.speech.with_streaming_response.create(model="qwen3-tts-flash",voice="Cherry",input="Hello! Welcome to SeaLink Audio API.",speed=1.0,) as response:response.stream_to_file("speech.mp3")
POST /v1/audio/transcriptions
Speech-to-text. Transcribe uploaded files or public audio URLs with multiple output formats, timestamps, and sync or async task flows.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| file | file | Either | Audio file, max 25MB; use file_url for file-transcription Alibaba ASR models |
| file_url | string | Either | Publicly accessible HTTP(S) audio URL; JSON file_urls long jobs return 202 and are polled via /v1/tasks |
| model | string | Yes | qwen3-asr-flash, qwen3-asr-flash-filetrans, fun-asr, whisper-large-v3 |
| response_format | string | No | json, text, srt, verbose_json, vtt (default: json) |
| temperature | number | No | 0 - 1 |
| language | string | No | ISO-639-1 language code, e.g. en, zh |
| prompt | string | No | Guidance prompt, max 1024 chars |
| timestamp_granularities | array | No | word, segment |
curl https://test.sealink.io/v1/audio/transcriptions \-H "Authorization: Bearer <your-sealink-key>" \-F "model=qwen3-asr-flash" \-F "file_url=https://example.com/interview.mp3" \-F "response_format=json" \-F "language=en"
curl https://test.sealink.io/v1/audio/transcriptions \-H "Authorization: Bearer <your-sealink-key>" \-H "Content-Type: application/json" \-d '{"model": "qwen3-asr-flash","file_url": "https://example.com/interview.mp3","response_format": "json","language": "en"}'
curl https://test.sealink.io/v1/audio/transcriptions \-H "Authorization: Bearer <your-sealink-key>" \-H "Content-Type: application/json" \-d '{"model": "qwen3-asr-flash-filetrans","file_urls": ["https://example.com/long-audio.mp3"],"language": "en"}'curl https://test.sealink.io/v1/tasks/<task_id> \-H "Authorization: Bearer <your-sealink-key>"
from openai import OpenAIclient = OpenAI(base_url="https://test.sealink.io/v1",api_key="<your-sealink-key>",)import requestsresp = requests.post("https://test.sealink.io/v1/audio/transcriptions",headers={"Authorization": "Bearer <your-sealink-key>"},data={"model": "qwen3-asr-flash","file_url": "https://example.com/interview.mp3","response_format": "json","language": "en",},)resp.raise_for_status()print(resp.json()["text"])
POST /v1/audio/translations
Audio translation. Translate audio from any language into English text. Same parameters as transcriptions (language not supported).
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| file | file | Yes | Audio file, max 25MB |
| model | string | Yes | whisper-large-v3 |
| response_format | string | No | json, text, srt, verbose_json, vtt |
| temperature | number | No | 0 - 1 |
| prompt | string | No | Guidance prompt, max 1024 chars |
curl https://test.sealink.io/v1/audio/translations \-H "Authorization: Bearer <your-sealink-key>" \-F "file=@korean_speech.mp3" \-F "model=whisper-large-v3"
from openai import OpenAIclient = OpenAI(base_url="https://test.sealink.io/v1",api_key="<your-sealink-key>",)audio_file = open("korean_speech.mp3", "rb")translation = client.audio.translations.create(model="whisper-large-v3",file=audio_file,)print(translation.text)
POST /v1/audio/music
AI music generation. Use a currently public music model from your catalog snapshot. Generate music with or without lyrics.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | udio-v2 |
| prompt | string | Yes | Music description prompt |
| duration | integer | No | 1 - 600 seconds |
| n | integer | No | 1 - 4 |
| style | string | No | Style description, max 256 chars |
| lyrics | string | No | Lyrics, max 4096 chars |
| instrumental | boolean | No | Instrumental only (no lyrics) |
curl https://test.sealink.io/v1/audio/music \-H "Authorization: Bearer <your-sealink-key>" \-H "Content-Type: application/json" \-d '{"model": "udio-v2","prompt": "An uplifting orchestral piece with piano and strings, 120 BPM","duration": 180,"n": 1,"style": "cinematic orchestral","instrumental": true}'
from openai import OpenAIclient = OpenAI(base_url="https://test.sealink.io/v1",api_key="<your-sealink-key>",)# Custom endpoint – use raw requestsimport requestsres = requests.post("https://test.sealink.io/v1/audio/music",headers={"Authorization": "Bearer <your-sealink-key>"},json={"model": "udio-v2","prompt": "An uplifting orchestral piece, 120 BPM","duration": 180,"n": 1,"instrumental": True,},)print(res.json())
POST /v1/audio/sound-effects
AI sound effects generation. Describe a sound in natural language and get a short audio clip. Ideal for games, film, podcasts.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | elevenlabs-sfx |
| prompt | string | Yes | Sound effect description |
| duration_seconds | integer | No | 1 - 30 (seconds) |
| n | integer | No | 1 - 4 |
curl https://test.sealink.io/v1/audio/sound-effects \-H "Authorization: Bearer <your-sealink-key>" \-H "Content-Type: application/json" \-d '{"model": "elevenlabs-sfx","prompt": "Heavy rain on a tin roof with distant thunder","duration_seconds": 10,"n": 1}'
import requestsres = requests.post("https://test.sealink.io/v1/audio/sound-effects",headers={"Authorization": "Bearer <your-sealink-key>"},json={"model": "elevenlabs-sfx","prompt": "Heavy rain on a tin roof with distant thunder","duration_seconds": 10,"n": 1,},)print(res.json())
POST /v1/audio/voice-clone
Voice cloning. Provide a reference audio and text to generate matching speech. Create custom voices or digital avatars.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | elevenlabs-voice-clone |
| audio | string | Yes | Reference audio as base64 or URL |
| voice_name | string | No | Custom voice name, max 128 chars |
| text | string | No | Text to speak, max 4096 chars |
| response_format | string | No | mp3, wav, ogg (default: mp3) |
curl https://test.sealink.io/v1/audio/voice-clone \-H "Authorization: Bearer <your-sealink-key>" \-H "Content-Type: application/json" \-d '{"model": "elevenlabs-voice-clone","audio": "https://example.com/reference.wav","voice_name": "my-custom-voice","text": "This is a test of my cloned voice.","response_format": "mp3"}' \--output cloned.mp3
import requestsres = requests.post("https://test.sealink.io/v1/audio/voice-clone",headers={"Authorization": "Bearer <your-sealink-key>"},json={"model": "elevenlabs-voice-clone","audio": "https://example.com/reference.wav","voice_name": "my-custom-voice","text": "This is a test of my cloned voice.","response_format": "mp3",},)# Response is audio binarywith open("cloned.mp3", "wb") as f:f.write(res.content)