Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.artnex.app/llms.txt

Use this file to discover all available pages before exploring further.

Text-to-Speech converts your written content into natural-sounding audio in seconds. Powered by ElevenLabs and Kling TTS, you can generate voiceovers for videos, podcasts, accessibility content, and more — without recording equipment or a voice actor. Go to Tools > Audio > VoiceZen > Text to Speech.

Generate speech

1

Enter your text

Type or paste the text you want to convert into the input field. You can also click the Sparkles icon to automatically enhance your text, or the Pencil icon to open the full prompt editor.
2

Choose a model

Select your TTS engine from the model dropdown:
  • ElevenLabs — 20 voices across American, British, and Australian accents. Supports 28 languages. Best for natural-sounding, expressive speech.
  • Kling TTS — 30+ voices with Chinese, British, and American accents. Best for multilingual and regional character voices.
Switching models resets your voice selection to the first available voice in that model’s library.
3

Select a voice

Click the voice button (showing the current voice name) to open the Voice Selector. Browse voices by gender and accent, then click one to select it.ElevenLabs voices include:
VoiceGenderAccent
AriaFemaleAmerican
SarahFemaleAmerican
CharlotteFemaleBritish
AliceFemaleBritish
LilyFemaleBritish
RogerMaleAmerican
GeorgeMaleBritish
DanielMaleBritish
CharlieMaleAustralian
RiverNon-binaryAmerican
And 10+ more voices available in the selector.
4

Choose a language (ElevenLabs only)

When using the ElevenLabs model, select your target language from the language dropdown. Supported languages include English, Spanish, French, German, Italian, Portuguese, Hindi, Arabic, Japanese, Korean, Chinese, and 17 more.
Setting the language to anything other than English automatically switches to the multilingual model for accurate pronunciation.
5

Adjust advanced settings (optional)

Click Settings to fine-tune the output:ElevenLabs settings:
  • Similarity (0–2) — Higher values match the voice’s natural characteristics more closely.
  • Stability (0–1) — Higher values produce more consistent, predictable delivery. Lower values introduce variation.
  • Speaker Boost — Toggle on to enhance voice clarity and presence.
Kling TTS settings:
  • Speed (0.8×–2×) — Controls playback speed of the generated audio.
6

Generate

Click Create. Generation takes a few seconds. Your audio appears in the player when ready.
7

Download

Click the download button on the audio player to save your file as an MP3.

Credit costs

ModelCost per generation
ElevenLabs10 credits
Kling TTS100 credits
Use ElevenLabs for most projects — it costs 10 credits and supports 28 languages with expressive, natural voices.

Use cases

Video voiceovers

Add professional narration to explainer videos, tutorials, and social content without recording a voiceover.

Podcasts and audio content

Convert blog posts or scripts into podcast episodes, or produce audio versions of written articles.

Accessibility

Generate audio versions of written content so users with visual impairments or reading difficulties can consume it.

Marketing and ads

Produce ad spots, product demos, and promotional audio for campaigns without hiring voice talent.