Transcribe audio and video
to text instantly
Fast and accurate AI-powered transcriptions. Upload your file and get results in seconds.
Drag your file here
or click to choose from your device
Language
Audio source
Estimated speed
— Maximum quality always guaranteed
Up to 2 GB per file. Formats: MP3, MPEG, M4A, AAC, WAV, OGG, OPUS, WMA, MP4, MOV, WMV.
Powered by
Three simple steps
From your file to ready-to-use text. No hassle.
Upload your file
Audio or video in any popular format
We transcribe with AI
We process in seconds with maximum accuracy
Download your text
In the format you need: TXT, SRT, VTT, JSON
Tools to transcribe without limits
Everything you need in one place. Powerful, fast and designed to fit your workflow.
Identify who speaks
Automatically differentiates voices, groups interventions by speaker and lets you rename them with a click to personalize the full transcription.
- Up to 10 speakers per recording
- Avatar with initial and unique color per person
- Rename any speaker with one click
(00:12)Okay, let's get started. Ana, how's the onboarding design going?
(00:19)Good, I have the three flows in Figma ready for review.
(00:26)I'll share the link this afternoon with marked comments.
(00:34)Perfect. I'm blocked on the Stripe integration.
(00:41)The subscription.updated webhook isn't coming through.
(00:48)Let's talk to infra after the standup.
Export in any format
Download your transcription in the standard formats you already use — ready for YouTube, Premiere, technical integrations or archiving.
- SRT and VTT for video subtitles
- Plain TXT and structured JSON
- Download the optimized audio in MP3 too
Professional translation editor
Translate your transcription into more than 20 languages preserving speakers and timestamps, with a synchronized two-column view like a professional editor.
- More than 20 target languages
- Synchronized columns — original and translation
- Download translation in SRT, VTT, TXT or JSON
Edit with full history
Correct the transcription in plain text — maximum compatibility with all export formats. Every change is archived in a history you can return to at any time.
- Plain text, compatible with SRT, VTT, TXT and JSON
- Full history of every edit
- Restore any version with one click
Clickable timestamps
Each line synchronized with the audio. Click any timestamp to jump to the exact point.
- Precise synchronization
- Instant navigation
- Active line highlight
Have questions?
Answers to the most common questions about transcriptfy — and if you don't find yours, we have the full page at the bottom.
transcriptfy converts your audio and video files to text using artificial intelligence. You upload the recording, we process it in seconds and return the text with timestamps, speaker identification and exportable in standard formats (TXT, SRT, VTT, JSON). It's designed for journalists, podcasters, researchers, lawyers, students and anyone who spends too much time manually writing down what someone said.
We accept the most common formats: MP3, WAV, M4A, AAC, OGG, OPUS, WMA and FLAC for audio; MP4, MOV, MKV, WebM, AVI and WMV for video. If you upload a video, we automatically extract the audio track — no need to convert it first.
It depends on the length of the file and the options you activate, but in most cases a 30-minute audio is transcribed in 1 to 3 minutes. Options such as speaker recognition or subsequent translation add some time. Before clicking «Transcribe» we show you a speed estimate based on the file and chosen options.
It depends on whether you have an active subscription: up to 2 GB per file and 1 file per batch without a subscription (guest or free account), up to 5 GB per file and 3 simultaneous files with any active package. If your recording is larger, split it into segments or contact us.
Yes. With the «Recognize speakers» option enabled we automatically label who speaks in each intervention. It works well for up to about 10 different speakers. Afterwards you can rename each one («Speaker 1» → «Maria Torres») and the change is applied to the entire transcription, translation and summary.
More than +99 languages, including Spanish, English, French, German, Portuguese, Italian, Mandarin Chinese, Japanese, Arabic and all major European and Asian languages. By default we automatically detect the language with over 95% accuracy, but you can select it manually if you know it — it improves quality in short or noisy audio.
Yes. Each transcription includes an editing tab where you correct the text word by word while maintaining segments and speakers. Changes are archived in a revision history you can return to at any time — so you can experiment without fear of losing the previous version.
Yes, into more than 20 target languages. We translate segment by segment respecting timestamps and speakers, with a two-column view (original on the left, translation on the right), synchronized scrolling and hover-mirror that highlights the equivalent segment in the other column. You can have several active translations at the same time for the same file — for example Spanish → English and Spanish → French.
Yes. We export in SRT and VTT — the standard formats compatible with YouTube, Premiere, Final Cut, web players and virtually all video editors. You can also download in TXT (plain text), JSON (full structure with timestamps, speakers and metadata) or the original audio in one click.
Your files travel encrypted to Cloudflare R2, with access via temporary signed URLs. The upload from your browser goes directly to storage, without passing through intermediate servers where they could be exposed. We do not use your content to train AI models or share it with third parties beyond the processing needed to generate the transcription, translation or summary you requested.
Yes. In guest mode you can transcribe a 30-second sample per file and see the result before deciding. If you like it, when you register the sample is automatically linked to your account and you process the full file — without losing what you had already started.
We work with minute packages: you choose the one that best fits your monthly volume and pay a per-minute price that decreases according to the package. The available packages, the per-minute price for each and the included features are explained on the pricing page.