WhisperX
Audio & Music
Visit Website
Rating ⭐ 4.2
Pricing Free ?
Views 675

WhisperX

Enhanced Whisper transcription library that adds word-level timestamps, speaker diarization, and fast batched inference.

WhisperX - website preview

About

WhisperX is an open-source Python library that extends OpenAI's Whisper automatic speech recognition with several practical improvements. It uses batched inference to reach transcription speeds up to 70 times faster than real time, and it applies wav2vec2 forced alignment to produce accurate word-level timestamps rather than just utterance-level ones. Speaker diarization through pyannote-audio labels which person is speaking at any moment, making it well suited for meeting recordings and interviews. Voice activity detection reduces hallucinations on silent segments. The library runs on GPU or CPU, supports Mac, and can be installed via pip. It is free and open source, though the diarization models require a free HuggingFace token.

Reviews (0)

Sign in to leave a review.

No reviews yet. Be the first to review!