Text to speech
Turn text into natural African-language speech with POST /voice/tts.
Request
POST https://api.satryx.ai/voice/tts — JSON body.
| Field | Type | Default | Notes |
|---|---|---|---|
text | string | — | Required. 1–5000 characters. |
voice_id | string | af_heart | A voice from GET /voice/voices, e.g. vocabusta_yo_female. |
speed | number | 1.0 | Playback speed, 0.5–2.0. |
language | string | null | null | Language hint, e.g. yo, pcm. Usually inferred from the voice. |
exaggeration | number | null | null | Chatterbox emphasis/emotion, 0.0–1.0. Unset uses the voice's registry default. |
cfg_weight | number | null | null | How tightly synthesis follows the reference voice, 0.0–1.0. Unset uses the voice's default. |
stability | number | 0.5 | 0.0–1.0. Applies to non-Chatterbox (Kokoro) voices. |
similarity | number | 0.75 | 0.0–1.0. Applies to non-Chatterbox (Kokoro) voices. |
Response
200 OK with the raw WAV audio as the body (Content-Type: audio/wav).
Synthesis metadata is returned in the X-Vox-Metadata response header as a
JSON string:
{
"id": "0f3c…",
"voice_id": "vocabusta_yo_female",
"voice_name": "Adunni",
"text": "Ẹ káàbọ̀.",
"duration_seconds": 1.42,
"sample_rate": 24000,
"character_count": 9,
"created_at": "2026-06-27T10:00:00Z"
}
Examples
cURL
curl https://api.satryx.ai/voice/tts \
-H "Authorization: Bearer $SATRYX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Ẹ káàbọ̀ sí VocaBusta.",
"voice_id": "vocabusta_yo_female",
"speed": 1.0,
"exaggeration": 0.6
}' \
--output yoruba.wav
Python
import os, requests
res = requests.post(
"https://api.satryx.ai/voice/tts",
headers={"Authorization": f"Bearer {os.environ['SATRYX_API_KEY']}"},
json={
"text": "Ndewo, nnọọ na VocaBusta.",
"voice_id": "vocabusta_ig_male",
"speed": 1.0,
},
)
res.raise_for_status()
open("igbo.wav", "wb").write(res.content)
Streaming
For low-latency playback, POST /voice/tts/stream takes the same body and
streams WAV audio chunks as they're synthesized (Transfer-Encoding: chunked,
Content-Type: audio/wav). Use it when you're piping audio straight to a player
rather than saving a file.
curl -N https://api.satryx.ai/voice/tts/stream \
-H "Authorization: Bearer $SATRYX_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "Streaming live...", "voice_id": "vocabusta_en_ng_female"}' \
--output stream.wav
Tips
- Expressiveness — raise
exaggerationfor livelier delivery; raisecfg_weightto hew closer to the reference timbre. Leave both unset to use each voice's tuned default. - Tone matters — for Yoruba and Igbo, include the correct diacritics in
text; the model is tone-aware and the wrong tone changes the word. - Chunk long text — for very long passages, split on sentence boundaries and concatenate the WAVs client-side for snappier first-audio.
Next
- Voices & languages — every
voice_id - Voice cloning — synthesize in a custom cloned voice