VocaBusta Voice API (1.0.0)

Download OpenAPI specification:

Lifelike African voices — text-to-speech, transcription, voice cloning and video dubbing. The same engine that powers VocaBusta Studio.

All endpoints live under /voice on the Satryx API. Every call needs a Bearer API key (satryx_live_… / satryx_test_…); generation endpoints also require an active VocaBusta subscription on the account.

Speech

Text-to-speech synthesis

Synthesize speech

Synthesize text to a WAV audio file. The response body is raw audio/wav; synthesis metadata is returned in the X-Vox-Metadata response header as a JSON string.

Authorizations:
apiKey
Request Body schema: application/json
required
text
required
string [ 1 .. 5000 ] characters
voice_id
string
Default: "af_heart"
speed
number <float> [ 0.5 .. 2 ]
Default: 1
language
string or null
exaggeration
number or null [ 0 .. 1 ]

Chatterbox emphasis/emotion. Unset uses the voice default.

cfg_weight
number or null [ 0 .. 1 ]

How tightly synthesis follows the reference voice.

stability
number <float> [ 0 .. 1 ]
Default: 0.5

Applies to non-Chatterbox (Kokoro) voices.

similarity
number <float> [ 0 .. 1 ]
Default: 0.75

Applies to non-Chatterbox (Kokoro) voices.

Responses

Request samples

Content type
application/json
{
  • "text": "How far? Welcome to VocaBusta.",
  • "voice_id": "vocabusta_pcm_female",
  • "speed": 1,
  • "language": "pcm",
  • "exaggeration": 1,
  • "cfg_weight": 1,
  • "stability": 0.5,
  • "similarity": 0.75
}

Response samples

Content type
application/json
{
  • "detail": "VocaBusta is not active for this account. Manage it in Billing."
}

Stream speech synthesis

Same request as /voice/tts, but streams WAV audio chunks as they are synthesized for low-latency playback.

Authorizations:
apiKey
Request Body schema: application/json
required
text
required
string [ 1 .. 5000 ] characters
voice_id
string
Default: "af_heart"
speed
number <float> [ 0.5 .. 2 ]
Default: 1
language
string or null
exaggeration
number or null [ 0 .. 1 ]

Chatterbox emphasis/emotion. Unset uses the voice default.

cfg_weight
number or null [ 0 .. 1 ]

How tightly synthesis follows the reference voice.

stability
number <float> [ 0 .. 1 ]
Default: 0.5

Applies to non-Chatterbox (Kokoro) voices.

similarity
number <float> [ 0 .. 1 ]
Default: 0.75

Applies to non-Chatterbox (Kokoro) voices.

Responses

Request samples

Content type
application/json
{
  • "text": "How far? Welcome to VocaBusta.",
  • "voice_id": "vocabusta_pcm_female",
  • "speed": 1,
  • "language": "pcm",
  • "exaggeration": 1,
  • "cfg_weight": 1,
  • "stability": 0.5,
  • "similarity": 0.75
}

Response samples

Content type
application/json
{
  • "detail": "VocaBusta is not active for this account. Manage it in Billing."
}

Transcription

Speech-to-text

Transcribe audio

Transcribe an uploaded audio file. African languages are routed to the Vocabanga ASR model; other languages fall back to Whisper.

Authorizations:
apiKey
Request Body schema: multipart/form-data
required
file
required
string <binary>

The audio file to transcribe.

language
string

VocaBusta language code (e.g. yo, pcm) or auto.

word_timestamps
boolean
Default: true

Include per-word start/end times.

Responses

Response samples

Content type
application/json
{
  • "id": "string",
  • "transcript": "string",
  • "language": "pcm",
  • "duration_seconds": 0,
  • "segments": [
    ],
  • "engine": "string",
  • "model": "string"
}

Voices

Voice catalog and cloning

List voices

Return every available voice — VocaBusta African-language voices plus any voices you have cloned. This endpoint is ungated.

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Delete a cloned voice

Delete a cloned voice. Only cloned_… voices can be deleted.

Authorizations:
apiKey
path Parameters
voice_id
required
string
Example: cloned_a1b2c3

Responses

Response samples

Content type
application/json
{
  • "status": "deleted",
  • "voice_id": "cloned_a1b2c3"
}

Clone a voice

Clone a voice from a short reference clip (~10–30s, one speaker). Cloning is zero-shot — the clone is ready in seconds.

Authorizations:
apiKey
Request Body schema: multipart/form-data
required
file
required
string <binary>

Clean reference clip.

name
required
string

Display name for the voice.

description
string
Default: ""

Responses

Response samples

Content type
application/json
{
  • "voice_id": "cloned_a1b2c3",
  • "name": "string",
  • "description": "string",
  • "status": "processing",
  • "preview_url": "string",
  • "created_at": "2019-08-24T14:15:22Z"
}

Dubbing

Video analysis, translation and re-voicing

Analyze a video

Analyze a video into a transcript with word timing and speaker diarization. Returns a job_id; poll /voice/dub/jobs/{job_id} until done. The finished result is a DubAnalysis.

Authorizations:
apiKey
Request Body schema: multipart/form-data
required
file
required
string <binary>
language
string

Spoken language hint.

diarize
boolean
Default: true

Responses

Response samples

Content type
application/json
{
  • "job_id": "string"
}

Translate dub segments

Translate every segment's text into the target language.

Authorizations:
apiKey
Request Body schema: application/json
required
required
Array of objects (DubSegment)
target_language
required
string

One of en, yo, ig, ha, sw, zu.

Responses

Request samples

Content type
application/json
{
  • "segments": [
    ],
  • "target_language": "yo"
}

Response samples

Content type
application/json
{
  • "segments": [
    ],
  • "target_language": "string"
}

Render a dubbed video

Render a dubbed video from translated segments and per-speaker voice assignments. Returns a job_id; poll /voice/dub/jobs/{job_id}. The finished result is { video_base64, format }.

Authorizations:
apiKey
Request Body schema: multipart/form-data
required
file
required
string <binary>
segments
required
string

JSON-encoded array of translated DubSegment.

voice_map
string
Default: "{}"

JSON object mapping speaker → voice_id or "preserve".

exaggeration
number <float> [ 0 .. 1 ]
cfg_weight
number <float> [ 0 .. 1 ]

Responses

Response samples

Content type
application/json
{
  • "job_id": "string"
}

Poll a dubbing job

Authorizations:
apiKey
path Parameters
job_id
required
string

Responses

Response samples

Content type
application/json
{
  • "status": "queued",
  • "progress": 0.45,
  • "result": { },
  • "error": "string"
}

History

Per-account generation history

List generation history

Authorizations:
apiKey
query Parameters
limit
integer
Default: 50
offset
integer
Default: 0
feature
string
Enum: "tts" "stt" "clone" "dub"

Filter by feature.

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Delete a history item

Authorizations:
apiKey
path Parameters
item_id
required
string

Responses

Response samples

Content type
application/json
{
  • "status": "deleted",
  • "id": "string"
}