Whisper AI: Complete Transcription Guide

How to use OpenAI Whisper for accurate audio transcription. Local setup, API usage, and best apps.

OpenAI Whisper Transcription Guide

Whisper is OpenAI's speech recognition model that delivers remarkable accuracy across 99 languages. Whether you're transcribing interviews, meetings, podcasts, or videos, this guide covers everything you need to know.

What is Whisper?

Whisper is an automatic speech recognition (ASR) model trained on 680,000 hours of multilingual audio. It offers:

  • 99 language support
  • Translation to English
  • Punctuation and formatting
  • Speaker-aware transcription
  • Noise robustness
  • Ways to Use Whisper

    1. OpenAI API (Easiest)

  • No setup required
  • Pay per minute
  • Best for occasional use
  • 25MB file limit
  • 2. Local Installation (Free)

  • Requires Python setup
  • Free after setup
  • Best for high volume
  • No file size limits
  • 3. Apps Using Whisper (Most Convenient)

  • User-friendly interfaces
  • Various pricing models
  • Additional features
  • Method 1: OpenAI API

    Setup:

    python
    pip install openai
    

    Basic Usage:

    python
    from openai import OpenAI
    client = OpenAI()

    audio_file = open("meeting.mp3", "rb") transcript = client.audio.transcriptions.create( model="whisper-1", file=audio_file ) print(transcript.text)

    With Options:

    python
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="srt",  # or vtt, json, text
        language="en",  # ISO code
        prompt="Technical meeting about AI"  # Context helps accuracy
    )
    

    Pricing:

  • $0.006 per minute
  • Example: 1-hour video = $0.36
  • Method 2: Local Installation

    Requirements:

  • Python 3.8+
  • FFmpeg
  • 4-10GB RAM (depending on model)
  • Install:

    bash
    pip install openai-whisper
    

    Install FFmpeg:

  • Mac: brew install ffmpeg
  • Windows: Download from ffmpeg.org
  • Linux: apt install ffmpeg
  • Basic Usage:

    bash
    whisper audio.mp3 --model medium
    

    Model Sizes:

    | Model | Size | RAM | Speed | Accuracy | |-------|------|-----|-------|----------| | tiny | 39M | ~1GB | Fastest | Basic | | base | 74M | ~1GB | Fast | Good | | small | 244M | ~2GB | Medium | Better | | medium | 769M | ~5GB | Slow | Great | | large | 1550M | ~10GB | Slowest | Best |

    Recommended: Start with medium for good balance.

    Advanced Options:

    bash
    whisper interview.mp3 \
      --model medium \
      --language English \
      --output_format srt \
      --output_dir ./transcripts \
      --task transcribe  # or translate
    

    Method 3: Apps Using Whisper

    MacWhisper (Mac)

  • Beautiful Mac app
  • Drag-and-drop
  • Free tier available
  • Pro: $29 one-time
  • Whisper Transcription (iOS)

  • Mobile transcription
  • On-device processing
  • Privacy-focused
  • TurboScribe

  • Web-based
  • Unlimited in pro tier
  • Speaker labels
  • From $10/month
  • Descript

  • Full audio/video editor
  • Uses Whisper + custom models
  • From $15/month
  • Otter.ai

  • Real-time transcription
  • Meeting integration
  • Free tier available
  • Tips for Best Results

    1. Audio Quality

  • Use good microphone
  • Minimize background noise
  • Record in quiet environment
  • Higher sample rate = better
  • 2. Use Prompts Provide context to improve accuracy:

    python
    prompt="Meeting about machine learning, discussing neural networks, transformers, and GPT models. Speakers: John (CEO), Sarah (CTO)."
    

    3. Choose Right Model

  • Quick notes → tiny/base
  • Important content → medium
  • Critical accuracy → large
  • 4. Post-Processing Even Whisper makes mistakes. Review:

  • Proper nouns and names
  • Technical terminology
  • Numbers and dates
  • Homophone errors
  • Common Use Cases

    Meeting Notes:

  • Record meeting (Zoom, Teams, etc.)
  • Export audio
  • Run through Whisper
  • Feed to ChatGPT: "Summarize these meeting notes and extract action items"
  • Podcast Transcription:

  • Download episode
  • Transcribe with Whisper (large model)
  • Edit transcript
  • Publish as show notes
  • Video Subtitles:

    bash
    whisper video.mp4 --model medium --output_format srt
    
    This creates video.srt for subtitles.

    Lecture Notes:

  • Record lecture
  • Transcribe
  • Use ChatGPT to create summary and study notes
  • Handling Long Files

    For files over 25MB (API limit):

    Option 1: Split Audio

    bash
    ffmpeg -i long_file.mp3 -f segment -segment_time 600 -c copy output%03d.mp3
    
    This splits into 10-minute chunks.

    Option 2: Local Whisper No file size limits with local installation.

    Option 3: Use Apps Most apps handle long files automatically.

    Language Translation

    Whisper can translate any supported language to English:

    bash
    whisper spanish_audio.mp3 --task translate
    

    Or via API:

    python
    translation = client.audio.translations.create(
        model="whisper-1",
        file=audio_file
    )
    

    Accuracy Comparison

    In testing, Whisper's accuracy on clean audio:

  • English: 95-98%
  • Major languages: 90-95%
  • With noise: 80-90%
  • Heavy accents: 85-95%
  • Factors affecting accuracy:

  • Audio quality
  • Background noise
  • Multiple speakers
  • Speaking speed
  • Technical vocabulary

Workflow Example

Journalist Interview:

  • Record with quality mic
  • Transcribe: whisper interview.mp3 --model large
  • Review and correct in text editor
  • Feed to Claude: "Edit this transcript for clarity and pull key quotes"
  • Final review and fact-check
  • The combination of Whisper + AI editing can cut transcription time by 80% compared to manual transcription.

    Share this article: