Video Narration

Upload a video and the AI reads every frame, writes a narration script based on what it sees, then speaks that script in a voice and style you choose. No manual scripting, no recording setup.

Drop your video here or browse

MP4, MOV, WEBM, AVI, MKV (max 50MB)

Product / App Name

Language

Voice

Narration Style

Background Music

Volume: 10%

Speech Speed

Subtitles

Custom Instructions (optional)

Creating narration...

Analyzing video frames, generating script, and creating voiceover

Download

How It Works

Upload a video (screen recording, tutorial, demo, etc.)
Choose a voice, narration style, and language
AI analyzes the video, writes a script, and adds voiceover automatically

Tips

Screen recordings and tutorials work best
Keep videos under 5 minutes for fastest results
Processing takes 2-5 minutes depending on video length
The AI describes what it sees happening on screen

Also Try

Subtitles

Auto-generate captions in 90+ languages

Video Ad

Create professional video ads from images

Lip Sync

Sync lip movements to any audio

Why Choose Video Narration

Frame-Level Vision

Florence-2 reads key frames and identifies UI elements, on-screen text, buttons, menus, and visual transitions. The narration script reflects what is actually visible, not a generic summary.

10+ Kokoro Voices

Pick from over ten Kokoro TTS voices spanning male, female, and British accents. Each voice renders at a quality level suited for published tutorials, demos, and training content.

Script to Merged Video

The pipeline ends with a complete MP4 file. Frame extraction, script generation, voice synthesis, and audio merge all happen server-side. You upload one file and download one file.

Five Narration Styles

Tutorial, Professional, Casual, Energetic, and Documentary are distinct prompt modes, not cosmetic labels. Each produces a different sentence structure, pacing, and tone in the final script.

Perfect For

Screen Recordings Tutorials Product Demos App Walkthroughs Training Videos Social Media Content Documentation

AI-Powered Video Analysis & Narration

Florence-2 Vision + Kokoro TTS

Video Narration runs a four-stage pipeline. Key frames are pulled from your video at regular intervals. Florence-2 analyzes each frame and produces a structured description of what it sees, including text, UI elements, and visual changes. An LLM then chains these descriptions into a single narration script that flows like a real presenter speaking through your content.

Kokoro TTS converts the script to audio using the voice and style you selected. The resulting audio track is timed and merged with your original video file, preserving your visuals exactly while adding the spoken layer. The final output is a standard MP4.

Frequently Asked Questions

The tool extracts frames from your video, runs each frame through Florence-2 to identify what is visible on screen, feeds those descriptions to an LLM to write a narration script, synthesizes that script with Kokoro TTS using your chosen voice and style, then merges the audio back into your original video.

Videos where visual content carries the story: screen recordings, software tutorials, product demos, and app walkthroughs. Florence-2 is particularly strong at reading on-screen text, identifying buttons and menus, and tracking UI state changes between frames.

Usually 2 to 5 minutes. A one-minute video typically processes in about 2 minutes. A five-minute video may take up to 5 minutes. Frame count and resolution both affect speed. A progress indicator shows where the job is in the pipeline.

MP4, MOV, WEBM, AVI, and MKV files up to 50MB. Higher resolution sources give Florence-2 more detail to work with, which improves script accuracy. Output is always MP4.

Yes. Kokoro TTS provides over ten voices including American male, American female, and British options. Pair a voice with a narration style to tune both the sound and the script structure for your specific content.

Five styles that change how the LLM writes the script. Tutorial produces numbered steps and action cues. Professional keeps language formal and direct. Casual writes in first person with a conversational rhythm. Energetic uses short sentences and emphasis suited for promotional clips. Documentary narrates in third person with an informational tone.

Yes. Florence-2 is a vision model trained on detailed image-text pairs. It identifies on-screen text, UI controls, layout regions, and visual transitions. The LLM then converts those structured descriptions into spoken narration, so the script matches what viewers actually see.

Narration output is available in English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, and Chinese. Florence-2 analyzes the visual content independent of any text language displayed on screen, so the source video language does not affect frame analysis.

Yes. You can pick a built-in track from categories like corporate, ambient, and upbeat, or upload your own audio file. A volume control lets you set the music level low enough to stay behind the voiceover without competing with it.

It is passed directly to the LLM when building the narration script. Use it to specify a target audience, instruct the AI to emphasize a particular feature, add a closing call-to-action, or skip sections you do not want narrated. The more specific your instruction, the more precisely the script adapts.

Video Narration vs Other Methods

Feature	Luxoret AI	Manual / Traditional	Other Tools
Cost per Use	$0.15	$200-$1000+ per project	$0.20-$0.50 per video
Speed	Minutes, not hours	Hours of manual editing	Varies by complexity
Skill Required	None — AI handles it	Video editing expertise	Moderate learning curve
Software	Browser-based, nothing to install	Expensive editing suite	Desktop app required
Quality	AI-enhanced, professional	Depends on editor skill	Template-dependent
Revisions	Instant re-processing	Re-edit from scratch	Limited by plan

Video Narration

Creating narration...

How It Works

Tips

Also Try

Creating narration...

My Narrated Videos

Narration Complete

Why Choose Video Narration

Frame-Level Vision

10+ Kokoro Voices

Script to Merged Video

Five Narration Styles

Perfect For

AI-Powered Video Analysis & Narration

Florence-2 Vision + Kokoro TTS

Frequently Asked Questions

Video Narration vs Other Methods

Video Narration

Creating narration...

How It Works

Tips

Also Try

Creating narration...

My Narrated Videos

Narration Complete

Why Choose Video Narration

Frame-Level Vision

10+ Kokoro Voices

Script to Merged Video

Five Narration Styles

Perfect For

AI-Powered Video Analysis & Narration

Florence-2 Vision + Kokoro TTS

Frequently Asked Questions

Video Narration vs Other Methods

Explore More Video Tools

Text to Speech

Speech to Text

Video Dubbing

Screen Recorder