Index-tts2
Open-source text-to-speech model with zero-shot voice cloning, emotion control, and precise synthesis duration.
About
Index-TTS2 is an autoregressive text-to-speech system from the index-tts project that focuses on natural, expressive speech synthesis. It supports zero-shot voice cloning, meaning it can reproduce a speaker's voice from a short audio reference without additional training. A key differentiator is its duration control, which lets you specify how long a synthesized segment should be while preserving natural prosody. Emotional expression can be guided through reference audio clips, numerical vectors, or plain text descriptions, and the model separates emotion from speaker identity so each can be adjusted independently. Index-TTS2 is free and open source, intended for researchers and developers building speech applications.