Download OpenAPI specification:
Lifelike African voices — text-to-speech, transcription, voice cloning and video dubbing. The same engine that powers VocaBusta Studio.
All endpoints live under /voice on the Satryx API. Every call needs a
Bearer API key (satryx_live_… / satryx_test_…); generation endpoints also
require an active VocaBusta subscription on the account.
Synthesize text to a WAV audio file. The response body is raw
audio/wav; synthesis metadata is returned in the X-Vox-Metadata
response header as a JSON string.
| text required | string [ 1 .. 5000 ] characters |
| voice_id | string Default: "af_heart" |
| speed | number <float> [ 0.5 .. 2 ] Default: 1 |
| language | string or null |
| exaggeration | number or null [ 0 .. 1 ] Chatterbox emphasis/emotion. Unset uses the voice default. |
| cfg_weight | number or null [ 0 .. 1 ] How tightly synthesis follows the reference voice. |
| stability | number <float> [ 0 .. 1 ] Default: 0.5 Applies to non-Chatterbox (Kokoro) voices. |
| similarity | number <float> [ 0 .. 1 ] Default: 0.75 Applies to non-Chatterbox (Kokoro) voices. |
{- "text": "How far? Welcome to VocaBusta.",
- "voice_id": "vocabusta_pcm_female",
- "speed": 1,
- "language": "pcm",
- "exaggeration": 1,
- "cfg_weight": 1,
- "stability": 0.5,
- "similarity": 0.75
}{- "detail": "VocaBusta is not active for this account. Manage it in Billing."
}Same request as /voice/tts, but streams WAV audio chunks as they are
synthesized for low-latency playback.
| text required | string [ 1 .. 5000 ] characters |
| voice_id | string Default: "af_heart" |
| speed | number <float> [ 0.5 .. 2 ] Default: 1 |
| language | string or null |
| exaggeration | number or null [ 0 .. 1 ] Chatterbox emphasis/emotion. Unset uses the voice default. |
| cfg_weight | number or null [ 0 .. 1 ] How tightly synthesis follows the reference voice. |
| stability | number <float> [ 0 .. 1 ] Default: 0.5 Applies to non-Chatterbox (Kokoro) voices. |
| similarity | number <float> [ 0 .. 1 ] Default: 0.75 Applies to non-Chatterbox (Kokoro) voices. |
{- "text": "How far? Welcome to VocaBusta.",
- "voice_id": "vocabusta_pcm_female",
- "speed": 1,
- "language": "pcm",
- "exaggeration": 1,
- "cfg_weight": 1,
- "stability": 0.5,
- "similarity": 0.75
}{- "detail": "VocaBusta is not active for this account. Manage it in Billing."
}Transcribe an uploaded audio file. African languages are routed to the Vocabanga ASR model; other languages fall back to Whisper.
| file required | string <binary> The audio file to transcribe. |
| language | string VocaBusta language code (e.g. |
| word_timestamps | boolean Default: true Include per-word start/end times. |
{- "id": "string",
- "transcript": "string",
- "language": "pcm",
- "duration_seconds": 0,
- "segments": [
- {
- "id": 0,
- "start": 0,
- "end": 0,
- "text": "string",
- "words": [
- {
- "word": "string",
- "start": 0,
- "end": 0
}
]
}
], - "engine": "string",
- "model": "string"
}Return every available voice — VocaBusta African-language voices plus any voices you have cloned. This endpoint is ungated.
[- {
- "id": "vocabusta_yo_female",
- "name": "Adunni",
- "description": "string",
- "accent": "Yoruba",
- "gender": "female",
- "category": "vocabusta",
- "language": "yo",
- "language_name": "Yoruba",
- "tags": [
- "string"
], - "preview_url": "string",
- "engine": "vocabusta"
}
]Clone a voice from a short reference clip (~10–30s, one speaker). Cloning is zero-shot — the clone is ready in seconds.
| file required | string <binary> Clean reference clip. |
| name required | string Display name for the voice. |
| description | string Default: "" |
{- "voice_id": "cloned_a1b2c3",
- "name": "string",
- "description": "string",
- "status": "processing",
- "preview_url": "string",
- "created_at": "2019-08-24T14:15:22Z"
}Analyze a video into a transcript with word timing and speaker
diarization. Returns a job_id; poll /voice/dub/jobs/{job_id} until
done. The finished result is a DubAnalysis.
| file required | string <binary> |
| language | string Spoken language hint. |
| diarize | boolean Default: true |
{- "job_id": "string"
}Translate every segment's text into the target language.
required | Array of objects (DubSegment) |
| target_language required | string One of |
{- "segments": [
- {
- "id": 0,
- "start": 0,
- "end": 0,
- "text": "string",
- "speaker": "SPEAKER_00"
}
], - "target_language": "yo"
}{- "segments": [
- {
- "id": 0,
- "start": 0,
- "end": 0,
- "text": "string",
- "speaker": "SPEAKER_00"
}
], - "target_language": "string"
}Render a dubbed video from translated segments and per-speaker voice
assignments. Returns a job_id; poll /voice/dub/jobs/{job_id}. The
finished result is { video_base64, format }.
| file required | string <binary> |
| segments required | string JSON-encoded array of translated DubSegment. |
| voice_map | string Default: "{}" JSON object mapping speaker → voice_id or "preserve". |
| exaggeration | number <float> [ 0 .. 1 ] |
| cfg_weight | number <float> [ 0 .. 1 ] |
{- "job_id": "string"
}| limit | integer Default: 50 |
| offset | integer Default: 0 |
| feature | string Enum: "tts" "stt" "clone" "dub" Filter by feature. |
[- {
- "id": "string",
- "feature": "tts",
- "title": "string",
- "voice_name": "string",
- "text": "string",
- "transcript": "string",
- "audio_url": "string",
- "duration_seconds": 0,
- "character_count": 0,
- "created_at": "2019-08-24T14:15:22Z"
}
]