Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.artnex.app/llms.txt

Use this file to discover all available pages before exploring further.

Lip Sync takes a portrait photo (or short video) and an audio track, then animates the subject’s face — including mouth, expressions, and subtle head movements — to match the audio. The result is a realistic talking avatar video you can use for presentations, social content, explainers, or creative projects. Go to Tools > Video > Lip Sync in the Artnex sidebar.

How to create a lip-sync video

1

Upload a portrait image

Click Add Image in the control bar to upload a photo of the person you want to animate. For best results, see the reference image guidelines below.
Image files must be under 25 MB. JPEG and PNG are both supported.
2

Upload an audio file

Click Add Audio to upload the audio track the avatar should speak. Any common audio format is accepted (MP3, WAV, M4A, etc.).
Audio files must be under 25 MB. The InfiniteTalk Fast model supports audio tracks up to 10 minutes long.
3

Choose a model

Click the model selector button in the control bar to open the model dialog. Choose a model based on your quality, speed, and budget requirements.
4

Write an optional scene prompt

The prompt field is optional. You can use it to guide the visual style or scene context — for example: “Professional studio setting, natural lighting, calm expression.”
5

Select resolution

Use the resolution button to choose between 480p and 720p output.
6

Generate

Click Create to start generation. The credit cost for the selected model is shown on the button. Generation typically takes 1–5 minutes depending on the model and audio length.
7

Download your video

When complete, the lip-sync video appears in your My Creations gallery. Click the Download icon to save it.

Available models

Avatar OmniHuman

Animates a portrait photo into a lifelike avatar video with natural motion. Powered by ByteDance.15,000 credits per video

Avatar OmniHuman 1.5

Enhanced OmniHuman generation with improved facial expressions and smoother motion.18,000 credits per video

InfiniteTalk Fast

Fast audio-driven talking avatar generation. Supports audio tracks up to 10 minutes long.19,000 credits per video

InfiniteTalk

High-quality infinite lip-sync. Produces more natural mouth movements than the fast variant.20,000 credits per video

Kling v1 Avatar Standard

Standard quality Kling AI avatar lip-sync for general use.21,000 credits per video

Kling v2 Avatar Standard

Next-generation Kling avatar with improved expression fidelity at standard quality.25,000 credits per video

HunyuanAvatar

High-fidelity audio-driven avatar with emotion control. Produces expressive, cinematic results.25,000 credits per video

Kling v1 Avatar Pro

Professional-quality Kling avatar for detailed, high-resolution results.50,000 credits per video

Kling v2 Avatar Pro

Kling’s best avatar model. Highest fidelity lip-sync with superior expression realism.55,000 credits per video

Reference image guidelines

The quality of your output depends heavily on the source image you provide. Follow these guidelines to get the best results:

Do

  • Use a front-facing portrait with the face clearly visible
  • Choose a photo with even, natural lighting and no harsh shadows
  • Use an image where only one person is in the frame
  • Make sure the face is sharp and in focus
  • Use a neutral or slight smile expression for the most natural animation

Avoid

  • Heavily rotated or side-profile faces
  • Images with sunglasses, masks, or face coverings
  • Small faces in a large scene — crop the image to centre the face
  • Blurry, grainy, or low-resolution photos
  • Multiple people in the same frame
A headshot-style photo with the face filling most of the frame produces the most convincing animation. Portrait mode shots from a smartphone work well.
Only animate faces with the explicit permission of the person depicted. Review Artnex’s acceptable use policy before generating lip-sync videos of real individuals.