Skip to main content

Voice cloning

Clone a voice from a short reference clip, then synthesize new speech in that voice. Cloning is zero-shot — no training step, the clone is ready in seconds.

Clone a voice

POST https://api.satryx.ai/voice/clonemultipart/form-data.

FieldTypeDefaultNotes
filefileRequired. A clean reference clip (~10–30s of clear speech, one speaker).
namestringRequired. Display name for the voice.
descriptionstring""Optional description.

Response

200 OK — JSON:

{
"voice_id": "cloned_a1b2c3",
"name": "Chidi",
"description": "My narration voice",
"status": "ready",
"preview_url": "data:audio/wav;base64,UklGR... ",
"created_at": "2026-06-27T10:00:00Z"
}
  • voice_id always starts with cloned_.
  • status is one of processing | ready | failed.
  • preview_url is an instant sample of the cloned voice (a data URL) when the engine could generate one.

Example

curl https://api.satryx.ai/voice/clone \
-H "Authorization: Bearer $SATRYX_API_KEY" \
-F "file=@my-voice.wav" \
-F "name=Chidi" \
-F "description=My narration voice"
import os, requests

with open("my-voice.wav", "rb") as f:
res = requests.post(
"https://api.satryx.ai/voice/clone",
headers={"Authorization": f"Bearer {os.environ['SATRYX_API_KEY']}"},
files={"file": f},
data={"name": "Chidi", "description": "My narration voice"},
)
res.raise_for_status()
voice_id = res.json()["voice_id"]

Speak in the cloned voice

Pass the cloned_… id straight to /voice/tts as voice_id:

curl https://api.satryx.ai/voice/tts \
-H "Authorization: Bearer $SATRYX_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "This is my cloned voice.", "voice_id": "cloned_a1b2c3"}' \
--output cloned.wav

Delete a voice

DELETE https://api.satryx.ai/voice/voices/{voice_id} — removes a cloned voice. Only cloned_… voices can be deleted; premade and VocaBusta catalog voices cannot.

curl -X DELETE https://api.satryx.ai/voice/voices/cloned_a1b2c3 \
-H "Authorization: Bearer $SATRYX_API_KEY"

Returns { "status": "deleted", "voice_id": "cloned_a1b2c3" }.

Best practices

  • Quality in, quality out — use a clean, dry recording with no background music or overlapping speakers.
  • Consent — only clone voices you own or have explicit permission to clone.
  • Tune at synthesis time — adjust exaggeration and cfg_weight on /voice/tts to dial expressiveness vs. fidelity for the clone.