Audio

Audio endpoints cover speech synthesis, transcription, translation, music, sound effects, and voice-related tasks. Formats, duration limits, file sizes, and billing units vary by model; first probes should use short text, short audio, or shorter music duration.

`POST /v1/audio/speech`

Text-to-speech synthesis. Voice choices, formats, and billing units vary by TTS model.

Parameters

Parameter	Type	Required	Description
model	string	Yes	Choose a public audio-tts model from /v1/models; model details are the source of truth for voice, format, and task behavior.
input	string	Yes	Text to synthesize, max 4096 chars; when speech-2.8 models return async tasks, use output_format=url and poll /v1/tasks
voice	string	Yes	Model-specific; use the voices declared by the /v1/models model details.
response_format	string	No	mp3, opus, aac, flac, wav, pcm (default: mp3)
speed	number	No	0.25 - 4.0 (default: 1.0)
stream	boolean	No	Enable SSE streaming audio

cURL

curl https://test.sealink.io/v1/audio/speech \
  -H "Authorization: Bearer $SEALINK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-omni-flash",
    "input": "Hello! Welcome to SeaLink Audio API.",
    "voice": "Cherry",
    "speed": 1.0,
    "response_format": "mp3"
  }' \
  --output speech.mp3

Python

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://test.sealink.io/v1",
    api_key=os.environ["SEALINK_API_KEY"],
)

with client.audio.speech.with_streaming_response.create(
    model="qwen3-omni-flash",
    voice="Cherry",
    input="Hello! Welcome to SeaLink Audio API.",
    speed=1.0,
) as response:
    response.stream_to_file("speech.mp3")

`POST /v1/audio/transcriptions`

Speech-to-text. Transcribe uploaded files or public audio URLs with multiple output formats, timestamps, and sync or async task flows.

Parameters

Parameter	Type	Required	Description
file	file	Either	Audio file, max 25MB; use file_url for remote-audio transcription models.
file_url	string	Either	Publicly downloadable HTTP(S) audio URL; do not use signed-in, upstream-blocked, or expiring links. JSON file_urls long jobs return 202 and are polled via /v1/tasks
model	string	Yes	Choose a public audio-stt model from /v1/models; sync, async, realtime, and remote-audio behavior follows the model details.
response_format	string	No	json, text, srt, verbose_json, vtt (default: json)
temperature	number	No	0 - 1
language	string	No	ISO-639-1 language code, e.g. en, zh
prompt	string	No	Guidance prompt, max 1024 chars
timestamp_granularities	array	No	word, segment

cURL

curl https://test.sealink.io/v1/audio/transcriptions \
  -H "Authorization: Bearer $SEALINK_API_KEY" \
  -F "model=fun-asr" \
  -F "file_url=https://raw.githubusercontent.com/Jakobovski/free-spoken-digit-dataset/master/recordings/1_jackson_0.wav" \
  -F "response_format=json" \
  -F "language=en"

JSON

curl https://test.sealink.io/v1/audio/transcriptions \
  -H "Authorization: Bearer $SEALINK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fun-asr",
    "file_url": "https://raw.githubusercontent.com/Jakobovski/free-spoken-digit-dataset/master/recordings/1_jackson_0.wav",
    "response_format": "json",
    "language": "en"
  }'

Filetrans task

curl https://test.sealink.io/v1/audio/transcriptions \
  -H "Authorization: Bearer $SEALINK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<audio-transcription-task-model-from-/v1/models>",
    "file_urls": ["https://raw.githubusercontent.com/Jakobovski/free-spoken-digit-dataset/master/recordings/1_jackson_0.wav"],
    "language": "en"
  }'

curl https://test.sealink.io/v1/tasks/<task_id> \
  -H "Authorization: Bearer $SEALINK_API_KEY"

Python

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://test.sealink.io/v1",
    api_key=os.environ["SEALINK_API_KEY"],
)

import requests

resp = requests.post(
    "https://test.sealink.io/v1/audio/transcriptions",
    headers={"Authorization": f"Bearer {os.environ['SEALINK_API_KEY']}"},
    data={
        "model": "fun-asr",
        "file_url": "https://raw.githubusercontent.com/Jakobovski/free-spoken-digit-dataset/master/recordings/1_jackson_0.wav",
        "response_format": "json",
        "language": "en",
    },
)
resp.raise_for_status()
print(resp.json()["text"])

`POST /v1/audio/translations`

Audio translation. Translate audio from any language into English text. Optionally pass language to identify the source audio language.

Parameters

Parameter	Type	Required	Description
file	file	Yes	Audio file, max 25MB
model	string	Yes	<audio-translation-model-from-/v1/models>
response_format	string	No	json, text, srt, verbose_json, vtt
temperature	number	No	0 - 1
prompt	string	No	Guidance prompt, max 1024 chars

cURL

curl https://test.sealink.io/v1/audio/translations \
  -H "Authorization: Bearer $SEALINK_API_KEY" \
  -F "file=@korean_speech.mp3" \
  -F "model=<audio-translation-model-from-/v1/models>"

Python

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://test.sealink.io/v1",
    api_key=os.environ["SEALINK_API_KEY"],
)

audio_file = open("korean_speech.mp3", "rb")
translation = client.audio.translations.create(
    model="<audio-translation-model-from-/v1/models>",
    file=audio_file,
)
print(translation.text)

`POST /v1/audio/music`

AI music generation. Choose a public audio-music model from /v1/models. Generate music with or without lyrics; first probes should use shorter duration and n=1.

Parameters

Parameter	Type	Required	Description
model	string	Yes	Choose a public audio-music model from /v1/models
prompt	string	Yes	Music description, 10 - 2000 chars
duration	integer	No	1 - 600 seconds; use 15 seconds or shorter for first probes
n	integer	No	1 - 4
style	string	No	Style description, max 256 chars
lyrics_prompt	string	No	Lyrics, 10 - 3000 chars; lyrics is also accepted as an alias
audio_setting	object	No	Audio format, sample rate, or bitrate settings
instrumental	boolean	No	Instrumental only (no lyrics)

cURL

curl https://test.sealink.io/v1/audio/music \
  -H "Authorization: Bearer $SEALINK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "music-2.0",
    "prompt": "An uplifting orchestral piece with piano and strings, 120 BPM",
    "lyrics_prompt": "[Verse]\nMorning light across the city\n[Chorus]\nWe rise together",
    "duration": 15,
    "n": 1,
    "style": "cinematic orchestral",
    "instrumental": false
  }'

Python

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://test.sealink.io/v1",
    api_key=os.environ["SEALINK_API_KEY"],
)

# Custom endpoint – use raw requests
import requests

res = requests.post(
    "https://test.sealink.io/v1/audio/music",
    headers={"Authorization": f"Bearer {os.environ['SEALINK_API_KEY']}"},
    json={
        "model": "music-2.0",
        "prompt": "An uplifting orchestral piece, 120 BPM",
        "lyrics_prompt": "[Verse]\nMorning light across the city\n[Chorus]\nWe rise together",
        "duration": 15,
        "n": 1,
        "instrumental": False,
    },
)
print(res.json())

`POST /v1/audio/sound-effects`

AI sound effects generation. Describe a sound in natural language and get a short audio clip. Ideal for games, film, podcasts.

Parameters

Parameter	Type	Required	Description
model	string	Yes	Choose a public audio-sfx model from /v1/models
prompt	string	Yes	Sound effect description
duration_seconds	integer	No	1 - 30 (seconds)
n	integer	No	1 - 4

cURL

curl https://test.sealink.io/v1/audio/sound-effects \
  -H "Authorization: Bearer $SEALINK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "music-2.0",
    "prompt": "Heavy rain on a tin roof with distant thunder",
    "duration_seconds": 10,
    "n": 1
  }'

Python

import os
import requests

res = requests.post(
    "https://test.sealink.io/v1/audio/sound-effects",
    headers={"Authorization": f"Bearer {os.environ['SEALINK_API_KEY']}"},
    json={
        "model": "music-2.0",
        "prompt": "Heavy rain on a tin roof with distant thunder",
        "duration_seconds": 10,
        "n": 1,
    },
)
print(res.json())

`POST /v1/audio/voice-clone`

Voice cloning. Provide a reference audio and text to generate matching speech. Create custom voices or digital avatars.

Parameters

Parameter	Type	Required	Description
model	string	Yes	Choose a public audio-voice-clone model from /v1/models
audio	string	Yes	Reference audio as base64 or URL
voice_name	string	No	Custom voice name, max 128 chars
text	string	No	Text to speak, max 4096 chars
response_format	string	No	mp3, wav, ogg (default: mp3)

cURL

curl https://test.sealink.io/v1/audio/voice-clone \
  -H "Authorization: Bearer $SEALINK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "music-2.0",
    "audio": "https://raw.githubusercontent.com/Jakobovski/free-spoken-digit-dataset/master/recordings/1_jackson_0.wav",
    "voice_name": "my-custom-voice",
    "text": "This is a test of my cloned voice.",
    "response_format": "mp3"
  }' \
  --output cloned.mp3

Python

import os
import requests

res = requests.post(
    "https://test.sealink.io/v1/audio/voice-clone",
    headers={"Authorization": f"Bearer {os.environ['SEALINK_API_KEY']}"},
    json={
        "model": "music-2.0",
        "audio": "https://raw.githubusercontent.com/Jakobovski/free-spoken-digit-dataset/master/recordings/1_jackson_0.wav",
        "voice_name": "my-custom-voice",
        "text": "This is a test of my cloned voice.",
        "response_format": "mp3",
    },
)
# Response is audio binary
with open("cloned.mp3", "wb") as f:
    f.write(res.content)