Audio & Video Transcriber
Drop in an audio or video file and get an accurate transcript plus ready-to-use SRT and VTT subtitles — 100% in your browser, with nothing uploaded.
MP3, WAV, M4A, MP4, MOV and more. Everything runs in your browser — your file is never uploaded.
SRT and VTT subtitle files use the AI's time-codes — drop them into YouTube, Premiere, DaVinci Resolve or CapCut. Edit the text above before downloading the .txt if needed.
This transcriber turns speech in any audio or video file into text using an AI speech-recognition model (OpenAI Whisper) that runs entirely inside your browser — so your file is never uploaded to a server and stays completely private. Drop in an MP3, WAV, M4A, MP4 or similar file, and it returns a clean transcript plus time-coded SRT and VTT subtitle files you can use on YouTube, Premiere, CapCut or any video editor. It works across dozens of languages, has no length limits or signup, and is free. The AI model downloads once on first use (then it is cached), and a GPU (WebGPU) speeds it up automatically when available.
Frequently Asked Questions
Is this audio-to-text transcriber really free and private?
Yes — it is completely free with no signup, and it is private by design: the speech model runs inside your browser, so your audio or video file never leaves your device and is never uploaded to any server.
Does it upload my file to a server?
No. The transcription happens 100% on your own device using an in-browser AI model. Nothing is sent anywhere, which is why it is safe for sensitive recordings like meetings, interviews or voice notes.
Can I get SRT or VTT subtitles, not just text?
Yes. Along with the plain-text transcript you get time-coded SRT and VTT subtitle files to download, ready to drop into YouTube, Premiere Pro, DaVinci Resolve, CapCut or any subtitle workflow.
What audio and video formats and languages are supported?
Most common formats work — MP3, WAV, M4A, OGG, FLAC, plus MP4 and other videos (the audio track is read automatically). The Whisper model understands dozens of languages and transcribes them in their own script.
How accurate is the transcription?
It uses OpenAI Whisper, one of the best open speech-recognition models, so clear speech transcribes very accurately. Background noise, heavy accents or overlapping speakers can lower accuracy — you can quickly edit the transcript before exporting.
Why does it download something the first time?
The AI speech model (a few tens of MB) downloads once on first use and is then cached by your browser, so later transcriptions start instantly. A device with WebGPU runs it noticeably faster.