Job TreeNavigate the job tree to view your child job details

Loading job tree...

public

Fast, high quality speech transcription with many available backends, word-level timestamps, speaker diarization, and translation capabilities.

Code

ready

Outputs

waiting for outputs

Logs

listening for logs...

README

Speech Transcription

This app can precisely transcribe audio data, with additional options for auto-translation.

IMPORTANT (August 30, 2024): This function will significantly change over the next three weeks. If you have integrated this into your production code, please fix the version by passing it in as part of the function name to either the API or the SDK. We have posted an example function slug with version specifications below.

sieve/speech_transcriber:86b4f1f

For pricing, click here.

For detailed notes, click here.

Key Features

Word-level Timestamps: Provides precise timestamps for each word in the transcript (not available in groq-whisper).
Speaker Diarization: Identifies and labels different speakers in the audio.
Speed Boost Option: Accelerates transcription speed with a slight trade-off in accuracy.
Model Backend Options: Choose from various backend models to balance cost, quality, and speed.
Auto Translation: Dynamically translates transcriptions into multiple languages.

Pricing

Note (August 30, 2024): The pricing will change in the coming weeks. You can check the price of your job via the usage table or via API.

Pricing for this function is compute based.

As an estimate, word level timestamps enabled on a stable-ts backend cost us $0.15 / hr of audio, extrapolated from this example.

Notes

Picking the right settings

backend Options:
- whisperx: A fast transcription option available.
- stable-ts: Offers more accuracy, especially in timestamps.
- whisper-timestamped: Similar to stable-ts, focuses on accurate timestamps.
- whisper-zero: The highest quality option available, but also the slowest.
- groq-whisper: The fastest and cheapest option available, optimized using Groq (costs ~$0.111 / hour of audio).
Enabling speaker_diarization returns speaker IDs for each word in the transcript. This is useful if you want to know who said what.
Enabling speed_boost will use smaller models with either decoding approach. This is useful if you want to get results faster and don't mind sacrificing some accuracy.

Languages

We support 99 total languages. You may enter a language code into the language parameter if you already know the language of the original audio. If you don't know the language of the original audio, you may leave the language parameter blank and we will automatically detect the language of the original audio. If you want to see the full list of supported languages, you may refer to the table below.

en (English)
zh (Chinese)
de (German)
es (Spanish)
ru (Russian)
ko (Korean)
fr (French)
ja (Japanese)
pt (Portuguese)
tr (Turkish)
pl (Polish)
ca (Catalan)
nl (Dutch)
ar (Arabic)
sv (Swedish)
it (Italian)
id (Indonesian)
hi (Hindi)
fi (Finnish)
vi (Vietnamese)
he (Hebrew)
uk (Ukrainian)
el (Greek)
ms (Malay)
cs (Czech)
ro (Romanian)
da (Danish)
hu (Hungarian)
ta (Tamil)
no (Norwegian)
th (Thai)
ur (Urdu)
hr (Croatian)
bg (Bulgarian)
lt (Lithuanian)
la (Latin)
mi (Maori)
ml (Malayalam)
cy (Welsh)
sk (Slovak)
te (Telugu)
fa (Persian)
lv (Latvian)
bn (Bengali)
sr (Serbian)
az (Azerbaijani)
sl (Slovenian)
kn (Kannada)
et (Estonian)
mk (Macedonian)
br (Breton)
eu (Basque)
is (Icelandic)
hy (Armenian)
ne (Nepali)
mn (Mongolian)
bs (Bosnian)
kk (Kazakh)
sq (Albanian)
sw (Swahili)
gl (Galician)
mr (Marathi)
pa (Punjabi)
si (Sinhala)
km (Khmer)
sn (Shona)
yo (Yoruba)
so (Somali)
af (Afrikaans)
oc (Occitan)
ka (Georgian)
be (Belarusian)
tg (Tajik)
sd (Sindhi)
gu (Gujarati)
am (Amharic)
yi (Yiddish)
lo (Lao)
uz (Uzbek)
fo (Faroese)
ps (Pashto)
tk (Turkmen)
nn (Nynorsk)
mt (Maltese)
sa (Sanskrit)
lb (Luxembourgish)
my (Myanmar)
bo (Tibetan)
tl (Tagalog)
mg (Malagasy)
as (Assamese)
tt (Tatar)
haw (Hawaiian)
ln (Lingala)
ha (Hausa)
ba (Bashkir)
jw (Javanese)
su (Sundanese)
yue (Cantonese)
my (Burmese)
ca (Valencian)
nl (Flemish)
ht (Haitian)
lb (Letzeburgesch)
ps (Pushto)
pa (Panjabi)
ro (Moldavian)
si (Sinhalese)
es (Castilian)
zh (Mandarin)

MORE EXAMPLES

See more examples of this app by clicking on the jobs below.

1f88f522-180e-45f7-9e7f-8d439c5037b4