π£οΈWTF Transcription
World Transcription Format β a vendor-neutral analysis shape for speech-to-text output.
What it is
When to use it
Spec surface
{
"analysis": [
{
"type": "transcript",
"dialog": 0,
"vendor": "openai-whisper",
"product": "whisper-large-v3",
"encoding": "json",
"schema": "https://datatracker.ietf.org/doc/draft-howe-vcon-wtf/",
"body": "{\"transcript\":{\"text\":\"Hello, I need help with my account.\",\"language\":\"en\",\"duration\":3.2,\"confidence\":0.95},\"segments\":[{\"id\":0,\"start\":0.0,\"end\":3.2,\"text\":\"Hello, I need help with my account.\",\"confidence\":0.95}],\"metadata\":{\"created_at\":\"2026-05-18T10:00:00Z\",\"provider\":\"whisper\",\"model\":\"whisper-large-v3\"}}"
}
]
}The WTF document shape
Python helper
See also
Last updated
Was this helpful?