Skip to main content

Dubbing

Re-voice a video into a new language. Dubbing is a three-step pipeline:

  1. Analyze the video → transcript with word timing + speaker diarization.
  2. Translate the segments into the target language.
  3. Render a new video with each speaker re-voiced.

Analyze and render are long-running jobs: they return a job_id immediately; poll GET /voice/dub/jobs/{job_id} until status is done.

1. Analyze

POST https://api.satryx.ai/voice/dub/analyzemultipart/form-data.

FieldTypeDefaultNotes
filefileRequired. The source video.
languagestringautoSpoken language hint.
diarizebooleantrueDetect and label distinct speakers.

Returns { "job_id": "…" }. Poll the job; the finished result is a DubAnalysis:

{
"language": "en",
"duration_seconds": 42.0,
"segments": [
{ "id": 0, "start": 0.0, "end": 3.2, "text": "Hello everyone.", "speaker": "SPEAKER_00" }
],
"speakers": ["SPEAKER_00", "SPEAKER_01"],
"diarized": true
}

2. Translate

POST https://api.satryx.ai/voice/dub/translate — JSON. Translates every segment's text into the target language (this step is synchronous).

FieldTypeNotes
segmentsarrayThe segments from analyze (each keeps start/end/speaker).
target_languagestringOne of the dubbing targets below.
{
"segments": [ { "id": 0, "start": 0.0, "end": 3.2, "text": "Hello everyone.", "speaker": "SPEAKER_00" } ],
"target_language": "yo"
}

Returns { "segments": [...], "target_language": "yo" } with each text replaced by its translation. Edit the returned text freely before rendering.

Dubbing target languages: en, yo, ig, ha, sw, zu. (Nigerian Pidgin isn't a translation target yet.)

3. Render

POST https://api.satryx.ai/voice/dub/rendermultipart/form-data.

FieldTypeDefaultNotes
filefileRequired. The original video again.
segmentsstring (JSON)Required. The translated segments array, JSON-encoded.
voice_mapstring (JSON){}Map each speaker → a voice_id, or "preserve" to keep that speaker's own voice (clone).
exaggerationnumberOptional Chatterbox emphasis, 0.01.0.
cfg_weightnumberOptional Chatterbox guidance, 0.01.0.

Returns { "job_id": "…" }. Poll the job; the finished result is { "video_base64": "…", "format": "mp4" }.

A voice_map assigns voices per speaker:

{ "SPEAKER_00": "vocabusta_yo_female", "SPEAKER_01": "preserve" }

"preserve" re-voices the speaker in the new language while keeping their own voice timbre (via cloning); a voice_id swaps them to a catalog or cloned voice.

4. Poll a job

GET https://api.satryx.ai/voice/dub/jobs/{job_id}:

{ "status": "running", "progress": 0.45, "result": null, "error": null }

status is queued | running | done | error. When done, result holds the analysis (analyze) or { video_base64, format } (render). When error, error holds the message.

End-to-end (Python)

import os, json, time, base64, requests

BASE = "https://api.satryx.ai"
H = {"Authorization": f"Bearer {os.environ['SATRYX_API_KEY']}"}

def poll(job_id):
while True:
time.sleep(2.5)
job = requests.get(f"{BASE}/voice/dub/jobs/{job_id}", headers=H).json()
if job["status"] == "done":
return job["result"]
if job["status"] == "error":
raise RuntimeError(job["error"])

# 1. Analyze
with open("clip.mp4", "rb") as f:
job = requests.post(f"{BASE}/voice/dub/analyze", headers=H,
files={"file": f}, data={"diarize": "true"}).json()
analysis = poll(job["job_id"])

# 2. Translate to Yoruba
tr = requests.post(f"{BASE}/voice/dub/translate", headers=H, json={
"segments": analysis["segments"], "target_language": "yo",
}).json()

# 3. Render (preserve every speaker's own voice)
voice_map = {spk: "preserve" for spk in analysis["speakers"]}
with open("clip.mp4", "rb") as f:
job = requests.post(f"{BASE}/voice/dub/render", headers=H, files={"file": f}, data={
"segments": json.dumps(tr["segments"]),
"voice_map": json.dumps(voice_map),
}).json()
result = poll(job["job_id"])

open("dubbed.mp4", "wb").write(base64.b64decode(result["video_base64"]))

Next