How to Use HeyGen to Translate a Video Into Another Language

April 5, 2026 · Editorial Team · 6 min read · heygen ai-video video-translation

If you've ever watched a dubbed foreign film and found yourself distracted by the mouth movements not matching the audio, you understand the problem HeyGen is trying to fix. The platform's video translation feature doesn't just swap audio tracks; it adjusts the speaker's lip movements to match the translated language. The result is close enough to pass a casual viewing test in most cases.

I tested it on a 4-minute English presenter video translated into Spanish and French, and the output was usable for social distribution. For broadcast or anything with close facial scrutiny, you'd want to review carefully, but for YouTube, LinkedIn, and course platforms it holds up well.

Getting started with Video Translation

HeyGen's translation feature is called Video Translate and lives in its own section of the dashboard, separate from the avatar creation tools. Navigate there and click New Translation.

Upload your source video. Accepted formats: MP4, MOV, WebM. The video should:

Have clear, well-lit face footage of the speaker. Translation quality degrades significantly on dark or blurry face footage.
Have clean, intelligible audio. Background music mixed under dialogue is usually fine, but heavy audio compression or echo makes transcription less accurate, which means a worse translation to start.
Be 30 minutes or shorter per upload on standard plans.

After upload, HeyGen transcribes the source audio automatically. Review the transcription before proceeding because the translation quality downstream depends heavily on the transcription accuracy. If a technical term or proper noun was transcribed incorrectly, the translation will carry that error forward.

Setting your target language and voice

Once the transcription is confirmed, select your target language from the dropdown. HeyGen supports over 40 languages as of early 2026. The most-supported ones with highest voice quality include:

Language	Voice count	Lip sync quality
Spanish (Latin American)	12+	High
French	10+	High
German	8+	High
Portuguese (Brazil)	8+	High
Japanese	6+	Medium
Hindi	6+	Medium
Arabic	5+	Medium
Korean	5+	Medium

Lip sync quality differences come from the quantity of training data. Languages with more data produce tighter mouth movement matching. For languages in the medium category, the audio is correct but the lip sync is looser.

After picking the target language, choose a voice. You have three options:

HeyGen stock voices: pre-built voices in the target language. Reliable, fast to generate.
Voice clone from source: HeyGen extracts the vocal characteristics of the speaker in your uploaded video and generates the translated audio in a voice that matches their tone and pace.
Custom voice: if you've previously created a voice clone in HeyGen for the speaker, you can select it here.

For marketing content where brand voice consistency matters, the "clone from source" option is worth trying. It makes translated versions feel like the same person speaking rather than a generic voice actor.

Voice cloning quality and limitations

The voice clone built from a source video works best when the source audio is clean and runs at least 60 seconds. Shorter clips don't give the model enough data to capture speaking style accurately.

Honest limitations: the cloned voice captures tone and approximate timbre but doesn't perfectly reproduce accent or regional vocal characteristics. A speaker with a strong regional accent may come out sounding more neutral in the clone. This is usually acceptable for training or informational content, but noticeable to speakers of the target language who know what the natural accent should sound like.

For translated videos going to audiences where regional authenticity matters (marketing in a specific country, not just the language), consider using a stock voice from the target region rather than cloning. HeyGen labels voices with region variants: "Spanish (Mexico)," "Spanish (Spain)," "Spanish (Argentina)" are separate options with distinct accents.

Lip sync settings and output

Once language and voice are set, click Generate Translation. The model runs in three stages: translation, voice synthesis, and lip sync rendering. Generation time for a 4-minute video typically runs 8 to 15 minutes.

In the output settings panel before generating, you'll see:

Lip sync intensity: low, medium, high. Medium works for most content. High makes the mouth movement more aggressively match the target language phonemes; useful for close-up talking head content but can look slightly artificial on medium shots.
Background audio preservation: toggle to keep any background music or sound effects from the original. Leave this on unless the source video has audio elements that don't translate contextually.
Subtitle generation: optionally auto-generate subtitles in the target language. These can be burned in or exported as a separate SRT file.

After generation, preview the full video before downloading. Scrub through the footage with attention to the moments where the translation is significantly longer or shorter than the original audio. Translation length mismatches are where lip sync struggles most, and these spots are worth checking manually.

Common quality issues and how to handle them

Mouth area blurring: HeyGen applies a slight softening around the mouth region during lip sync rerendering. This is most visible in high-resolution close-up footage. Reducing lip sync intensity from high to medium reduces this effect.

Translation length mismatch: some languages are naturally more verbose than others (German and Russian tend to expand; Japanese and Chinese can contract). When the translated audio runs significantly longer than the original, HeyGen either compresses the audio slightly or the speaker appears to speak faster. Review these segments specifically.

Proper nouns and brand names: names don't always survive translation intact. "HeyGen" might become something phonetically approximate in a language with different phonology. Review all product names, company names, and technical terms in the transcription and translation before generating.

Transcript errors on accented speech: if the source speaker has a strong accent, auto-transcription accuracy drops. You can manually edit the transcription in HeyGen's interface before translation. This 10-minute correction step saves much larger problems downstream.

Batch translation for multi-language content

On HeyGen's Creator and Business plans, you can submit the same source video for translation into multiple languages simultaneously. Select all target languages from a multi-select list, choose voice settings for each, and HeyGen processes them in parallel.

This is the practical path for content that needs to go out in five or six languages at once, like product launch announcements or quarterly updates for distributed teams. The alternative, processing translations one at a time, would take three to four hours of wall time. Batch processing brings that down to roughly the time of a single translation.

Exporting and distribution

Download the translated video as MP4. If you generated subtitles, you can download the SRT file separately. HeyGen also provides a shareable link per translation, which is convenient for quick review with stakeholders before final distribution.

Video resolution matches the source upload. If you uploaded 1080p footage, you get 1080p translated output. HeyGen does not upscale or downscale during translation processing.

For regular content workflows, HeyGen has API access on Business plans, which lets you programmatically submit videos for translation and retrieve results, useful if you're building an automated content localization pipeline. The manual interface covers everything you need for occasional use.