Dubbing

Re-voice a video into a new language. Dubbing is a three-step pipeline:

Analyze the video → transcript with word timing + speaker diarization.
Translate the segments into the target language.
Render a new video with each speaker re-voiced.

Analyze and render are long-running jobs: they return a job_id immediately; poll GET /voice/dub/jobs/{job_id} until status is done.

1. Analyze

POST https://api.satryx.ai/voice/dub/analyze — multipart/form-data.

Field	Type	Default	Notes
`file`	file	—	Required. The source video.
`language`	string	auto	Spoken language hint.
`diarize`	boolean	`true`	Detect and label distinct speakers.

Returns { "job_id": "…" }. Poll the job; the finished result is a DubAnalysis:

{
  "language": "en",
  "duration_seconds": 42.0,
  "segments": [
    { "id": 0, "start": 0.0, "end": 3.2, "text": "Hello everyone.", "speaker": "SPEAKER_00" }
  ],
  "speakers": ["SPEAKER_00", "SPEAKER_01"],
  "diarized": true
}

2. Translate

POST https://api.satryx.ai/voice/dub/translate — JSON. Translates every segment's text into the target language (this step is synchronous).

Field	Type	Notes
`segments`	array	The `segments` from analyze (each keeps `start`/`end`/`speaker`).
`target_language`	string	One of the dubbing targets below.

{
  "segments": [ { "id": 0, "start": 0.0, "end": 3.2, "text": "Hello everyone.", "speaker": "SPEAKER_00" } ],
  "target_language": "yo"
}

Returns { "segments": [...], "target_language": "yo" } with each text replaced by its translation. Edit the returned text freely before rendering.

Dubbing target languages: en, yo, ig, ha, sw, zu. (Nigerian Pidgin isn't a translation target yet.)

3. Render

POST https://api.satryx.ai/voice/dub/render — multipart/form-data.

Field	Type	Default	Notes
`file`	file	—	Required. The original video again.
`segments`	string (JSON)	—	Required. The translated segments array, JSON-encoded.
`voice_map`	string (JSON)	`{}`	Map each speaker → a `voice_id`, or `"preserve"` to keep that speaker's own voice (clone).
`exaggeration`	number	—	Optional Chatterbox emphasis, `0.0`–`1.0`.
`cfg_weight`	number	—	Optional Chatterbox guidance, `0.0`–`1.0`.

Returns { "job_id": "…" }. Poll the job; the finished result is { "video_base64": "…", "format": "mp4" }.

A voice_map assigns voices per speaker:

{ "SPEAKER_00": "vocabusta_yo_female", "SPEAKER_01": "preserve" }

"preserve" re-voices the speaker in the new language while keeping their own voice timbre (via cloning); a voice_id swaps them to a catalog or cloned voice.

4. Poll a job

GET https://api.satryx.ai/voice/dub/jobs/{job_id}:

{ "status": "running", "progress": 0.45, "result": null, "error": null }

status is queued | running | done | error. When done, result holds the analysis (analyze) or { video_base64, format } (render). When error, error holds the message.

End-to-end (Python)

import os, json, time, base64, requests

BASE = "https://api.satryx.ai"
H = {"Authorization": f"Bearer {os.environ['SATRYX_API_KEY']}"}

def poll(job_id):
    while True:
        time.sleep(2.5)
        job = requests.get(f"{BASE}/voice/dub/jobs/{job_id}", headers=H).json()
        if job["status"] == "done":
            return job["result"]
        if job["status"] == "error":
            raise RuntimeError(job["error"])

# 1. Analyze
with open("clip.mp4", "rb") as f:
    job = requests.post(f"{BASE}/voice/dub/analyze", headers=H,
                        files={"file": f}, data={"diarize": "true"}).json()
analysis = poll(job["job_id"])

# 2. Translate to Yoruba
tr = requests.post(f"{BASE}/voice/dub/translate", headers=H, json={
    "segments": analysis["segments"], "target_language": "yo",
}).json()

# 3. Render (preserve every speaker's own voice)
voice_map = {spk: "preserve" for spk in analysis["speakers"]}
with open("clip.mp4", "rb") as f:
    job = requests.post(f"{BASE}/voice/dub/render", headers=H, files={"file": f}, data={
        "segments": json.dumps(tr["segments"]),
        "voice_map": json.dumps(voice_map),
    }).json()
result = poll(job["job_id"])

open("dubbed.mp4", "wb").write(base64.b64decode(result["video_base64"]))

Voices & languages — voice IDs for the voice_map
Rate limits & errors — handling long renders

1. Analyze​

2. Translate​

3. Render​

4. Poll a job​

End-to-end (Python)​

Next​

1. Analyze

2. Translate

3. Render

4. Poll a job

End-to-end (Python)

Next