Skip to main content

Speech to text

Transcribe audio into text with POST /voice/stt. African languages are routed to the Vocabanga ASR model; everything else falls back to Whisper.

Request

POST https://api.satryx.ai/voice/sttmultipart/form-data.

FieldTypeDefaultNotes
filefileRequired. The audio file to transcribe (wav, mp3, m4a, etc.).
languagestringautoA language code like yo, pcm, ha, or auto to detect.
word_timestampsbooleantrueInclude per-word start/end times in the segments.

Pass the original VocaBusta language code (e.g. ig, pcm, yo). The engine decides internally whether to use Vocabanga or Whisper — don't pre-map it.

Response

200 OK — JSON:

{
"id": "8a1d…",
"transcript": "How you dey? I dey fine.",
"language": "pcm",
"duration_seconds": 3.1,
"segments": [
{
"id": 0,
"start": 0.0,
"end": 3.1,
"text": "How you dey? I dey fine.",
"words": [
{ "word": "How", "start": 0.0, "end": 0.21 },
{ "word": "you", "start": 0.21, "end": 0.38 }
]
}
],
"engine": "vocabanga",
"model": "vocabanga-asr"
}

engine tells you which model transcribed: vocabanga (our African ASR) or whisper.

Examples

cURL

curl https://api.satryx.ai/voice/stt \
-H "Authorization: Bearer $SATRYX_API_KEY" \
-F "file=@interview.m4a" \
-F "language=pcm" \
-F "word_timestamps=true"

Python

import os, requests

with open("interview.m4a", "rb") as f:
res = requests.post(
"https://api.satryx.ai/voice/stt",
headers={"Authorization": f"Bearer {os.environ['SATRYX_API_KEY']}"},
files={"file": f},
data={"language": "pcm", "word_timestamps": "true"},
)
res.raise_for_status()
print(res.json()["transcript"])

Supported ASR languages

The transcription engine is tuned for these codes; others fall back to Whisper auto-detect.

CodeLanguage
autoAuto-detect
enEnglish
en_ngNigerian English
pcmNigerian Pidgin
yoYoruba
igIgbo (beta)
haHausa
swSwahili

See Voices & languages for the full picture.

Next

  • Dubbing — transcription + diarization + re-voicing for video