Speech to text

Transcribe audio into text with POST /voice/stt. African languages are routed to the Vocabanga ASR model; everything else falls back to Whisper.

Request

POST https://api.satryx.ai/voice/stt — multipart/form-data.

Field	Type	Default	Notes
`file`	file	—	Required. The audio file to transcribe (wav, mp3, m4a, etc.).
`language`	string	auto	A language code like `yo`, `pcm`, `ha`, or `auto` to detect.
`word_timestamps`	boolean	`true`	Include per-word start/end times in the segments.

Pass the original VocaBusta language code (e.g. ig, pcm, yo). The engine decides internally whether to use Vocabanga or Whisper — don't pre-map it.

Response

200 OK — JSON:

{
  "id": "8a1d…",
  "transcript": "How you dey? I dey fine.",
  "language": "pcm",
  "duration_seconds": 3.1,
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 3.1,
      "text": "How you dey? I dey fine.",
      "words": [
        { "word": "How", "start": 0.0, "end": 0.21 },
        { "word": "you", "start": 0.21, "end": 0.38 }
      ]
    }
  ],
  "engine": "vocabanga",
  "model": "vocabanga-asr"
}

engine tells you which model transcribed: vocabanga (our African ASR) or whisper.

Examples

cURL

curl https://api.satryx.ai/voice/stt \
  -H "Authorization: Bearer $SATRYX_API_KEY" \
  -F "file=@interview.m4a" \
  -F "language=pcm" \
  -F "word_timestamps=true"

Python

import os, requests

with open("interview.m4a", "rb") as f:
    res = requests.post(
        "https://api.satryx.ai/voice/stt",
        headers={"Authorization": f"Bearer {os.environ['SATRYX_API_KEY']}"},
        files={"file": f},
        data={"language": "pcm", "word_timestamps": "true"},
    )
res.raise_for_status()
print(res.json()["transcript"])

Supported ASR languages

The transcription engine is tuned for these codes; others fall back to Whisper auto-detect.

Code	Language
`auto`	Auto-detect
`en`	English
`en_ng`	Nigerian English
`pcm`	Nigerian Pidgin
`yo`	Yoruba
`ig`	Igbo (beta)
`ha`	Hausa
`sw`	Swahili

See Voices & languages for the full picture.

Dubbing — transcription + diarization + re-voicing for video

Request​

Response​

Examples​

cURL​

Python​

Supported ASR languages​

Next​