Voice cloning
Clone a voice from a short reference clip, then synthesize new speech in that voice. Cloning is zero-shot — no training step, the clone is ready in seconds.
Clone a voice
POST https://api.satryx.ai/voice/clone — multipart/form-data.
| Field | Type | Default | Notes |
|---|---|---|---|
file | file | — | Required. A clean reference clip (~10–30s of clear speech, one speaker). |
name | string | — | Required. Display name for the voice. |
description | string | "" | Optional description. |
Response
200 OK — JSON:
{
"voice_id": "cloned_a1b2c3",
"name": "Chidi",
"description": "My narration voice",
"status": "ready",
"preview_url": "data:audio/wav;base64,UklGR... ",
"created_at": "2026-06-27T10:00:00Z"
}
voice_idalways starts withcloned_.statusis one ofprocessing|ready|failed.preview_urlis an instant sample of the cloned voice (a data URL) when the engine could generate one.
Example
curl https://api.satryx.ai/voice/clone \
-H "Authorization: Bearer $SATRYX_API_KEY" \
-F "file=@my-voice.wav" \
-F "name=Chidi" \
-F "description=My narration voice"
import os, requests
with open("my-voice.wav", "rb") as f:
res = requests.post(
"https://api.satryx.ai/voice/clone",
headers={"Authorization": f"Bearer {os.environ['SATRYX_API_KEY']}"},
files={"file": f},
data={"name": "Chidi", "description": "My narration voice"},
)
res.raise_for_status()
voice_id = res.json()["voice_id"]
Speak in the cloned voice
Pass the cloned_… id straight to /voice/tts as
voice_id:
curl https://api.satryx.ai/voice/tts \
-H "Authorization: Bearer $SATRYX_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "This is my cloned voice.", "voice_id": "cloned_a1b2c3"}' \
--output cloned.wav
Delete a voice
DELETE https://api.satryx.ai/voice/voices/{voice_id} — removes a cloned voice.
Only cloned_… voices can be deleted; premade and VocaBusta catalog voices
cannot.
curl -X DELETE https://api.satryx.ai/voice/voices/cloned_a1b2c3 \
-H "Authorization: Bearer $SATRYX_API_KEY"
Returns { "status": "deleted", "voice_id": "cloned_a1b2c3" }.
Best practices
- Quality in, quality out — use a clean, dry recording with no background music or overlapping speakers.
- Consent — only clone voices you own or have explicit permission to clone.
- Tune at synthesis time — adjust
exaggerationandcfg_weighton/voice/ttsto dial expressiveness vs. fidelity for the clone.