OpenAI Whisper Transcription Guide
Whisper is OpenAI's speech recognition model that delivers remarkable accuracy across 99 languages. Whether you're transcribing interviews, meetings, podcasts, or videos, this guide covers everything you need to know.
What is Whisper?
Whisper is an automatic speech recognition (ASR) model trained on 680,000 hours of multilingual audio. It offers:
- 99 language support
- Translation to English
- Punctuation and formatting
- Speaker-aware transcription
- Noise robustness
- No setup required
- Pay per minute
- Best for occasional use
- 25MB file limit
- Requires Python setup
- Free after setup
- Best for high volume
- No file size limits
- User-friendly interfaces
- Various pricing models
- Additional features
Ways to Use Whisper
1. OpenAI API (Easiest)
2. Local Installation (Free)
3. Apps Using Whisper (Most Convenient)
Method 1: OpenAI API
Setup:
python
pip install openai
Basic Usage:
python
from openai import OpenAI
client = OpenAI()audio_file = open("meeting.mp3", "rb")
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
print(transcript.text)
With Options:
python
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="srt", # or vtt, json, text
language="en", # ISO code
prompt="Technical meeting about AI" # Context helps accuracy
)
Pricing:
Method 2: Local Installation
Requirements:
Install:
bash
pip install openai-whisper
Install FFmpeg:
brew install ffmpegapt install ffmpegBasic Usage:
bash
whisper audio.mp3 --model medium
Model Sizes:
| Model | Size | RAM | Speed | Accuracy | |-------|------|-----|-------|----------| | tiny | 39M | ~1GB | Fastest | Basic | | base | 74M | ~1GB | Fast | Good | | small | 244M | ~2GB | Medium | Better | | medium | 769M | ~5GB | Slow | Great | | large | 1550M | ~10GB | Slowest | Best |
Recommended: Start with medium for good balance.
Advanced Options:
bash
whisper interview.mp3 \
--model medium \
--language English \
--output_format srt \
--output_dir ./transcripts \
--task transcribe # or translate
Method 3: Apps Using Whisper
MacWhisper (Mac)
Whisper Transcription (iOS)
TurboScribe
Descript
Otter.ai
Tips for Best Results
1. Audio Quality
2. Use Prompts Provide context to improve accuracy:
python
prompt="Meeting about machine learning, discussing neural networks, transformers, and GPT models. Speakers: John (CEO), Sarah (CTO)."
3. Choose Right Model
4. Post-Processing Even Whisper makes mistakes. Review:
Common Use Cases
Meeting Notes:
Podcast Transcription:
Video Subtitles:
bash
whisper video.mp4 --model medium --output_format srt
This creates video.srt for subtitles.Lecture Notes:
Handling Long Files
For files over 25MB (API limit):
Option 1: Split Audio
bash
ffmpeg -i long_file.mp3 -f segment -segment_time 600 -c copy output%03d.mp3
This splits into 10-minute chunks.Option 2: Local Whisper No file size limits with local installation.
Option 3: Use Apps Most apps handle long files automatically.
Language Translation
Whisper can translate any supported language to English:
bash
whisper spanish_audio.mp3 --task translate
Or via API:
python
translation = client.audio.translations.create(
model="whisper-1",
file=audio_file
)
Accuracy Comparison
In testing, Whisper's accuracy on clean audio:
Factors affecting accuracy:
Workflow Example
Journalist Interview:
whisper interview.mp3 --model largeThe combination of Whisper + AI editing can cut transcription time by 80% compared to manual transcription.