Lip Sync takes a portrait photo (or short video) and an audio track, then animates the subject’s face — including mouth, expressions, and subtle head movements — to match the audio. The result is a realistic talking avatar video you can use for presentations, social content, explainers, or creative projects.Documentation Index
Fetch the complete documentation index at: https://docs.artnex.app/llms.txt
Use this file to discover all available pages before exploring further.
Navigate to Lip Sync
Go to Tools > Video > Lip Sync in the Artnex sidebar.How to create a lip-sync video
Upload a portrait image
Click Add Image in the control bar to upload a photo of the person you want to animate. For best results, see the reference image guidelines below.
Image files must be under 25 MB. JPEG and PNG are both supported.
Upload an audio file
Click Add Audio to upload the audio track the avatar should speak. Any common audio format is accepted (MP3, WAV, M4A, etc.).
Audio files must be under 25 MB. The InfiniteTalk Fast model supports audio tracks up to 10 minutes long.
Choose a model
Click the model selector button in the control bar to open the model dialog. Choose a model based on your quality, speed, and budget requirements.
Write an optional scene prompt
The prompt field is optional. You can use it to guide the visual style or scene context — for example: “Professional studio setting, natural lighting, calm expression.”
Generate
Click Create to start generation. The credit cost for the selected model is shown on the button. Generation typically takes 1–5 minutes depending on the model and audio length.
Available models
Avatar OmniHuman
Animates a portrait photo into a lifelike avatar video with natural motion. Powered by ByteDance.15,000 credits per video
Avatar OmniHuman 1.5
Enhanced OmniHuman generation with improved facial expressions and smoother motion.18,000 credits per video
InfiniteTalk Fast
Fast audio-driven talking avatar generation. Supports audio tracks up to 10 minutes long.19,000 credits per video
InfiniteTalk
High-quality infinite lip-sync. Produces more natural mouth movements than the fast variant.20,000 credits per video
Kling v1 Avatar Standard
Standard quality Kling AI avatar lip-sync for general use.21,000 credits per video
Kling v2 Avatar Standard
Next-generation Kling avatar with improved expression fidelity at standard quality.25,000 credits per video
HunyuanAvatar
High-fidelity audio-driven avatar with emotion control. Produces expressive, cinematic results.25,000 credits per video
Kling v1 Avatar Pro
Professional-quality Kling avatar for detailed, high-resolution results.50,000 credits per video
Kling v2 Avatar Pro
Kling’s best avatar model. Highest fidelity lip-sync with superior expression realism.55,000 credits per video
Reference image guidelines
The quality of your output depends heavily on the source image you provide. Follow these guidelines to get the best results:Do
- Use a front-facing portrait with the face clearly visible
- Choose a photo with even, natural lighting and no harsh shadows
- Use an image where only one person is in the frame
- Make sure the face is sharp and in focus
- Use a neutral or slight smile expression for the most natural animation
Avoid
- Heavily rotated or side-profile faces
- Images with sunglasses, masks, or face coverings
- Small faces in a large scene — crop the image to centre the face
- Blurry, grainy, or low-resolution photos
- Multiple people in the same frame