Dubbing
Re-voice a video into a new language. Dubbing is a three-step pipeline:
- Analyze the video → transcript with word timing + speaker diarization.
- Translate the segments into the target language.
- Render a new video with each speaker re-voiced.
Analyze and render are long-running jobs: they return a job_id immediately;
poll GET /voice/dub/jobs/{job_id} until status is done.
1. Analyze
POST https://api.satryx.ai/voice/dub/analyze — multipart/form-data.
| Field | Type | Default | Notes |
|---|---|---|---|
file | file | — | Required. The source video. |
language | string | auto | Spoken language hint. |
diarize | boolean | true | Detect and label distinct speakers. |
Returns { "job_id": "…" }. Poll the job; the finished result is a
DubAnalysis:
{
"language": "en",
"duration_seconds": 42.0,
"segments": [
{ "id": 0, "start": 0.0, "end": 3.2, "text": "Hello everyone.", "speaker": "SPEAKER_00" }
],
"speakers": ["SPEAKER_00", "SPEAKER_01"],
"diarized": true
}
2. Translate
POST https://api.satryx.ai/voice/dub/translate — JSON. Translates every
segment's text into the target language (this step is synchronous).
| Field | Type | Notes |
|---|---|---|
segments | array | The segments from analyze (each keeps start/end/speaker). |
target_language | string | One of the dubbing targets below. |
{
"segments": [ { "id": 0, "start": 0.0, "end": 3.2, "text": "Hello everyone.", "speaker": "SPEAKER_00" } ],
"target_language": "yo"
}
Returns { "segments": [...], "target_language": "yo" } with each text
replaced by its translation. Edit the returned text freely before rendering.
Dubbing target languages: en, yo, ig, ha, sw, zu. (Nigerian
Pidgin isn't a translation target yet.)
3. Render
POST https://api.satryx.ai/voice/dub/render — multipart/form-data.
| Field | Type | Default | Notes |
|---|---|---|---|
file | file | — | Required. The original video again. |
segments | string (JSON) | — | Required. The translated segments array, JSON-encoded. |
voice_map | string (JSON) | {} | Map each speaker → a voice_id, or "preserve" to keep that speaker's own voice (clone). |
exaggeration | number | — | Optional Chatterbox emphasis, 0.0–1.0. |
cfg_weight | number | — | Optional Chatterbox guidance, 0.0–1.0. |
Returns { "job_id": "…" }. Poll the job; the finished result is
{ "video_base64": "…", "format": "mp4" }.
A voice_map assigns voices per speaker:
{ "SPEAKER_00": "vocabusta_yo_female", "SPEAKER_01": "preserve" }
"preserve" re-voices the speaker in the new language while keeping their own
voice timbre (via cloning); a voice_id swaps them to a catalog or cloned voice.
4. Poll a job
GET https://api.satryx.ai/voice/dub/jobs/{job_id}:
{ "status": "running", "progress": 0.45, "result": null, "error": null }
status is queued | running | done | error. When done, result holds
the analysis (analyze) or { video_base64, format } (render). When error,
error holds the message.
End-to-end (Python)
import os, json, time, base64, requests
BASE = "https://api.satryx.ai"
H = {"Authorization": f"Bearer {os.environ['SATRYX_API_KEY']}"}
def poll(job_id):
while True:
time.sleep(2.5)
job = requests.get(f"{BASE}/voice/dub/jobs/{job_id}", headers=H).json()
if job["status"] == "done":
return job["result"]
if job["status"] == "error":
raise RuntimeError(job["error"])
# 1. Analyze
with open("clip.mp4", "rb") as f:
job = requests.post(f"{BASE}/voice/dub/analyze", headers=H,
files={"file": f}, data={"diarize": "true"}).json()
analysis = poll(job["job_id"])
# 2. Translate to Yoruba
tr = requests.post(f"{BASE}/voice/dub/translate", headers=H, json={
"segments": analysis["segments"], "target_language": "yo",
}).json()
# 3. Render (preserve every speaker's own voice)
voice_map = {spk: "preserve" for spk in analysis["speakers"]}
with open("clip.mp4", "rb") as f:
job = requests.post(f"{BASE}/voice/dub/render", headers=H, files={"file": f}, data={
"segments": json.dumps(tr["segments"]),
"voice_map": json.dumps(voice_map),
}).json()
result = poll(job["job_id"])
open("dubbed.mp4", "wb").write(base64.b64decode(result["video_base64"]))
Next
- Voices & languages — voice IDs for the
voice_map - Rate limits & errors — handling long renders