Text-to-Speech converts your written content into natural-sounding audio in seconds. Powered by ElevenLabs and Kling TTS, you can generate voiceovers for videos, podcasts, accessibility content, and more — without recording equipment or a voice actor.Documentation Index
Fetch the complete documentation index at: https://docs.artnex.app/llms.txt
Use this file to discover all available pages before exploring further.
Navigate to Text to Speech
Go to Tools > Audio > VoiceZen > Text to Speech.Generate speech
Enter your text
Type or paste the text you want to convert into the input field. You can also click the Sparkles icon to automatically enhance your text, or the Pencil icon to open the full prompt editor.
Choose a model
Select your TTS engine from the model dropdown:
- ElevenLabs — 20 voices across American, British, and Australian accents. Supports 28 languages. Best for natural-sounding, expressive speech.
- Kling TTS — 30+ voices with Chinese, British, and American accents. Best for multilingual and regional character voices.
Switching models resets your voice selection to the first available voice in that model’s library.
Select a voice
Click the voice button (showing the current voice name) to open the Voice Selector. Browse voices by gender and accent, then click one to select it.ElevenLabs voices include:
And 10+ more voices available in the selector.
| Voice | Gender | Accent |
|---|---|---|
| Aria | Female | American |
| Sarah | Female | American |
| Charlotte | Female | British |
| Alice | Female | British |
| Lily | Female | British |
| Roger | Male | American |
| George | Male | British |
| Daniel | Male | British |
| Charlie | Male | Australian |
| River | Non-binary | American |
Choose a language (ElevenLabs only)
When using the ElevenLabs model, select your target language from the language dropdown. Supported languages include English, Spanish, French, German, Italian, Portuguese, Hindi, Arabic, Japanese, Korean, Chinese, and 17 more.
Setting the language to anything other than English automatically switches to the multilingual model for accurate pronunciation.
Adjust advanced settings (optional)
Click Settings to fine-tune the output:ElevenLabs settings:
- Similarity (0–2) — Higher values match the voice’s natural characteristics more closely.
- Stability (0–1) — Higher values produce more consistent, predictable delivery. Lower values introduce variation.
- Speaker Boost — Toggle on to enhance voice clarity and presence.
- Speed (0.8×–2×) — Controls playback speed of the generated audio.
Credit costs
| Model | Cost per generation |
|---|---|
| ElevenLabs | 10 credits |
| Kling TTS | 100 credits |
Use cases
Video voiceovers
Add professional narration to explainer videos, tutorials, and social content without recording a voiceover.
Podcasts and audio content
Convert blog posts or scripts into podcast episodes, or produce audio versions of written articles.
Accessibility
Generate audio versions of written content so users with visual impairments or reading difficulties can consume it.
Marketing and ads
Produce ad spots, product demos, and promotional audio for campaigns without hiring voice talent.