Text to speech

Turn text into natural African-language speech with POST /voice/tts.

Request

POST https://api.satryx.ai/voice/tts — JSON body.

Field	Type	Default	Notes
`text`	string	—	Required. 1–5000 characters.
`voice_id`	string	`af_heart`	A voice from `GET /voice/voices`, e.g. `vocabusta_yo_female`.
`speed`	number	`1.0`	Playback speed, `0.5`–`2.0`.
`language`	string \| null	`null`	Language hint, e.g. `yo`, `pcm`. Usually inferred from the voice.
`exaggeration`	number \| null	`null`	Chatterbox emphasis/emotion, `0.0`–`1.0`. Unset uses the voice's registry default.
`cfg_weight`	number \| null	`null`	How tightly synthesis follows the reference voice, `0.0`–`1.0`. Unset uses the voice's default.
`stability`	number	`0.5`	`0.0`–`1.0`. Applies to non-Chatterbox (Kokoro) voices.
`similarity`	number	`0.75`	`0.0`–`1.0`. Applies to non-Chatterbox (Kokoro) voices.

Response

200 OK with the raw WAV audio as the body (Content-Type: audio/wav).

Synthesis metadata is returned in the X-Vox-Metadata response header as a JSON string:

{
  "id": "0f3c…",
  "voice_id": "vocabusta_yo_female",
  "voice_name": "Adunni",
  "text": "Ẹ káàbọ̀.",
  "duration_seconds": 1.42,
  "sample_rate": 24000,
  "character_count": 9,
  "created_at": "2026-06-27T10:00:00Z"
}

Examples

cURL

curl https://api.satryx.ai/voice/tts \
  -H "Authorization: Bearer $SATRYX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Ẹ káàbọ̀ sí VocaBusta.",
    "voice_id": "vocabusta_yo_female",
    "speed": 1.0,
    "exaggeration": 0.6
  }' \
  --output yoruba.wav

Python

import os, requests

res = requests.post(
    "https://api.satryx.ai/voice/tts",
    headers={"Authorization": f"Bearer {os.environ['SATRYX_API_KEY']}"},
    json={
        "text": "Ndewo, nnọọ na VocaBusta.",
        "voice_id": "vocabusta_ig_male",
        "speed": 1.0,
    },
)
res.raise_for_status()
open("igbo.wav", "wb").write(res.content)

Streaming

For low-latency playback, POST /voice/tts/stream takes the same body and streams WAV audio chunks as they're synthesized (Transfer-Encoding: chunked, Content-Type: audio/wav). Use it when you're piping audio straight to a player rather than saving a file.

curl -N https://api.satryx.ai/voice/tts/stream \
  -H "Authorization: Bearer $SATRYX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Streaming live...", "voice_id": "vocabusta_en_ng_female"}' \
  --output stream.wav

Tips

Expressiveness — raise exaggeration for livelier delivery; raise cfg_weight to hew closer to the reference timbre. Leave both unset to use each voice's tuned default.
Tone matters — for Yoruba and Igbo, include the correct diacritics in text; the model is tone-aware and the wrong tone changes the word.
Chunk long text — for very long passages, split on sentence boundaries and concatenate the WAVs client-side for snappier first-audio.

Voices & languages — every voice_id
Voice cloning — synthesize in a custom cloned voice

Request​

Response​

Examples​

cURL​

Python​

Streaming​

Tips​

Next​