Sync Lips to Audio with AI Avatars

Lip Sync takes a portrait photo (or short video) and an audio track, then animates the subject’s face — including mouth, expressions, and subtle head movements — to match the audio. The result is a realistic talking avatar video you can use for presentations, social content, explainers, or creative projects.

Navigate to Lip Sync

Go to Tools > Video > Lip Sync in the Artnex sidebar.

How to create a lip-sync video

Upload a portrait image

Click Add Image in the control bar to upload a photo of the person you want to animate. For best results, see the reference image guidelines below.

Image files must be under 25 MB. JPEG and PNG are both supported.

Upload an audio file

Click Add Audio to upload the audio track the avatar should speak. Any common audio format is accepted (MP3, WAV, M4A, etc.).

Audio files must be under 25 MB. The InfiniteTalk Fast model supports audio tracks up to 10 minutes long.

Choose a model

Click the model selector button in the control bar to open the model dialog. Choose a model based on your quality, speed, and budget requirements.

Write an optional scene prompt

The prompt field is optional. You can use it to guide the visual style or scene context — for example: “Professional studio setting, natural lighting, calm expression.”

Select resolution

Use the resolution button to choose between 480p and 720p output.

Generate

Click Create to start generation. The credit cost for the selected model is shown on the button. Generation typically takes 1–5 minutes depending on the model and audio length.

Download your video

When complete, the lip-sync video appears in your My Creations gallery. Click the Download icon to save it.

Available models

Avatar OmniHuman

Animates a portrait photo into a lifelike avatar video with natural motion. Powered by ByteDance.15,000 credits per video

Avatar OmniHuman 1.5

Enhanced OmniHuman generation with improved facial expressions and smoother motion.18,000 credits per video

InfiniteTalk Fast

Fast audio-driven talking avatar generation. Supports audio tracks up to 10 minutes long.19,000 credits per video

InfiniteTalk

High-quality infinite lip-sync. Produces more natural mouth movements than the fast variant.20,000 credits per video

Kling v1 Avatar Standard

Standard quality Kling AI avatar lip-sync for general use.21,000 credits per video

Kling v2 Avatar Standard

Next-generation Kling avatar with improved expression fidelity at standard quality.25,000 credits per video

HunyuanAvatar

High-fidelity audio-driven avatar with emotion control. Produces expressive, cinematic results.25,000 credits per video

Kling v1 Avatar Pro

Professional-quality Kling avatar for detailed, high-resolution results.50,000 credits per video

Kling v2 Avatar Pro

Kling’s best avatar model. Highest fidelity lip-sync with superior expression realism.55,000 credits per video

Reference image guidelines

The quality of your output depends heavily on the source image you provide. Follow these guidelines to get the best results:

Do

Use a front-facing portrait with the face clearly visible
Choose a photo with even, natural lighting and no harsh shadows
Use an image where only one person is in the frame
Make sure the face is sharp and in focus
Use a neutral or slight smile expression for the most natural animation

Avoid

Heavily rotated or side-profile faces
Images with sunglasses, masks, or face coverings
Small faces in a large scene — crop the image to centre the face
Blurry, grainy, or low-resolution photos
Multiple people in the same frame

A headshot-style photo with the face filling most of the frame produces the most convincing animation. Portrait mode shots from a smartphone work well.

Only animate faces with the explicit permission of the person depicted. Review Artnex’s acceptable use policy before generating lip-sync videos of real individuals.

Get Started

Image Tools

Video Tools

Audio Tools

Account & Billing

Changelog

Sync Lips to Audio with AI Avatars

Navigate to Lip Sync

How to create a lip-sync video

Available models

Avatar OmniHuman

Avatar OmniHuman 1.5

InfiniteTalk Fast

InfiniteTalk

Kling v1 Avatar Standard

Kling v2 Avatar Standard

HunyuanAvatar

Kling v1 Avatar Pro

Kling v2 Avatar Pro

Reference image guidelines

Do

Avoid

​Navigate to Lip Sync

​How to create a lip-sync video

​Available models

Avatar OmniHuman

Avatar OmniHuman 1.5

InfiniteTalk Fast

InfiniteTalk

Kling v1 Avatar Standard

Kling v2 Avatar Standard

HunyuanAvatar

Kling v1 Avatar Pro

Kling v2 Avatar Pro

​Reference image guidelines

Do

Avoid

Navigate to Lip Sync

How to create a lip-sync video

Available models

Reference image guidelines