Generate custom voices to narrate your content

Rachel Handley 30.Apr.2026

BeyondWords now offers voice generation, letting you create custom ElevenLabs voices from simple text prompts.

Finding the right voice for your content can be difficult. Premade voices get you close, but rarely exact. Voice cloning is an option, but requires you to find the right person, secure consent, and put legal agreements in place.

With voice generation, you define the characteristics you want, generate options instantly, and refine until it fits. No studio time; no compromises.

You can create your ideal voice in minutes, then easily add it to your audio and video workflows.

We’ve used a generated voice to narrate this very article.

How it works

Voice generation is built directly into the BeyondWords dashboard, so there’s no extra setup or tooling required.

Write a prompt describing the voice you want, defining characteristics like accent, age, tone, pacing, delivery style, and audio quality. The more specific you are, the better the result.

Once you’ve written your prompt, you can generate multiple voice options, compare them side by side, and refine until you get the right result.

When you’re happy, name your chosen voice and add it to your project. You can then use the voice to generate audio and video versions of your articles.

Voice generation currently uses the ElevenLabs Voice Design v3 model, and each voice can speak up to 74 languages. Learn more in our voice generation doc.

Example 1: Opinion columnist

Prompt: Native American English. Male, 40–55. Broadcast quality. Persona: opinion columnist. Emotion: confident, assertive, composed.

Strong, slightly low-pitched voice with a firm tone. Controlled but expressive delivery, using clear emphasis and variation in cadence to highlight key arguments without feeling over-performed.

Example 2: Regional newsroom journalist

Prompt: Norsk som morsmål. Kvinne, 25–35 år. Studiokvalitet. Rolle: lokal nyhetsreporter. Følelse: tydelig, jordnær og imøtekommende.

Lett og naturlig tone med klar artikulasjon og middels toneleie. Samtalepreget tempo med jevn rytme, som gir en kjent og lett tilgjengelig lytteopplevelse.

(Native Norwegian. Female, 25–35. Studio-quality. Persona: local news reporter. Emotion: clear, grounded, approachable.

Light, natural tone with crisp articulation and mid-range pitch. Conversational pacing with a steady rhythm, creating a familiar and accessible listening experience.)

Example 3: Policy analyst

Prompt: Native British English. Male, 35–50. Studio-quality. Persona: policy analyst. Emotion: analytical, composed, thoughtful.

Neutral, mid-to-low pitch with a calm, even delivery. Precise and structured, with minimal variation in tone to maintain clarity and credibility.

Example 4: TikTok influencer

Prompt: Native American English. Female, 20–30. Studio-quality. Persona: digital news creator. Emotion: energetic, engaging, slightly playful.

Bright, mid-to-high pitch with a fast, conversational delivery. Natural rhythm with clear emphasis on key points, using light variation in tone to keep attention. Feels informal but informed, with a confident, relatable style suited to short-form video and social platforms.

The right voice for every audience

We’ve already helped multiple publishers generate the ideal voice for their audiences.

One Norwegian publisher needed a voice to support a regional language standard that’s not well covered by premade voices. We generated multiple voice options and tuned them toward that variant, so the team could identify a strong fit for their editorial output.

Another news publisher wanted male and female voices that felt consistent with their brand. Instead of mixing and matching, we generated a paired set for them to review, making it easier for them to achieve a cohesive sound across their content.

Want to hear what your brand voice could sound like? Book a demo.