Realtime ASR
Realtime speech-to-text availability and integration notes.
Realtime ASR
Realtime ASR is for live captions, voice input, and meeting-style transcription. On the overseas Open API, realtime ASR availability is account and rollout dependent. Use the current GET /api/openapi.json schema and account configuration as the source of truth.
Request
Realtime ASR uses a WebSocket-style streaming model when enabled. The connection authenticates with the normal bearer token:
Authorization: Bearer YOUR_API_TOKENSend audio in the encoding and chunk size required by the enabled model. If your product records audio in the browser, route it through your backend or an approved token exchange flow rather than embedding the API token in client code.
Response
Responses are event streams with partial text, final text, timing metadata, and terminal errors. Treat partial results as temporary UI state and persist final segments only after the service marks them final.
Billing And Credits
Realtime ASR may charge by session duration or processed audio duration. Enforce maximum session length in your product, and call Profile before starting expensive live sessions.
Errors
If realtime ASR is unavailable for the account, use the HTTP Speech to Text endpoint for file transcription. For streaming errors, close the connection, persist the last final transcript, and let the user restart intentionally.